The Journal of 


Experimental Education 


A periodical report of scientific investigations relating to child development, 


Volume XXIV September, 1955 


CONTENTS 
Page 


A Study of Professional Distances Between the Raters of Teachers and 
Teachers Rated Earl Martin Grothe 1 


An Investigation of the New York State Regents Examinations in Science 
George Greisen Mallinson and Jacqueline V. Buck 43 


A Comparison of Wechsler Children’s Scale and Stanford-Binet Scores for 
Eight- and Nine-Year Olds Frank C. Arnold 91 


PUBLISHED QUARTERLY 


Published by Dembar Publications, Inc., 
Madison 3, Wisconsin. 
Entered as second-class matter October 17, 1938 at the post office at Madison, 
Wisconsin, under the act of March 3, 1879, 


Number 1 
$5.00 A YEAR ee $1.50 A COPY 


EDITORIAL BOARD 


A. 6. Barr, Chairman, Professor of Education, University of Wisconsin, Madisce 6, Wis. 


for materiale on 


of Rau U of 
tion, 
measurements, 


A. 
niver- 
sity, Lefayetie, Indiana. Editorially reasponsibie for mate- 

on teaching and supervision, published 


New 
riculua each June. 


CONTRIBUTING EDITORS 


Pennsy)vania. 


of Californie, Berkeley 4, 
College of Washington, Pullman, Washington. 
of Claw, iin 
Jobe of Psychology, University of 
of Callfornia, Berkeley, California. 
A. Consulting Psychologist, Halifax, 


Wisconsle, Madison, Wisesssla. 


Douglas E. 8ca Education, 


Board of Michignn Gate Col- 


Harold Seashore, Director, Peychologi- 

Da Segel, Educ Consultant, Specialist in T 
Education, Weshington, D. 


W. Terry, Professor of Educational Psychology, 


Ernest R. W 


Professor of Education, York Usi- 
versity, ork City. 


"Dev Graduate Stock, Univer. 


Journal of Experimental Education 


Volume XXIV 


September, 1955 


Number 1 


A STUDY OF PROFESSIONAL DISTANCES 


BETWEEN THE RATERS OF TEACHERS 
AND TEACHERS RATED’ 


EARL MARTIN GROTKE 
University of Southern California 


he Problem 


THIS STUDY attempts to show the re la- 
tionship between the attitudes of raters and ra- 
tees and the ratings given to teachers. It is hy- 
pothesized that the lengths of ‘‘professional dis- 
tances’’ between raters and ratees increase as 
teacher ratings decrease from good to average 
and average to poor. 

Definition of Professional Diswnce, —T he 
concept of professional distance is adaptedfrom 
the concept of social distance used in the field 
of sociology. In 1925, while studying racial 
attitudes, Bogardus devised an attitude scale 
called social distance. ‘‘He was interested in 
measuring degrees to which individual repre- 
sentatives of various racial and national groups 
were accepted or rejected.... Instead of mak- 
ing a distinction between favorable and unfavor- 
able attitudes, however, he conceived the prob- 
lem in terms of degrees of ‘distance’ which his 
subjects wished to keep between themselves and 
members of other groups. The more unfavor- 
able the attitude, from this point of view, the 
greater the social distance, and the more favor- 
able the attitude, the less the social distance. 
Thus, the social distance between two intimate 
friends would be zero, and at the other extreme 
the attitude of a rabid anti-Semite toward Jews 
would represent maximum social distance. ’’ 
(23:164) 

As applied to the field of teacher evaluation, 
professional distance refers to the frequency of 
disagreements and divergency between two pro- 
fessional workers on what constitutes the pr o- 
fessional role of the good teacher. Profession- 
al distance may be illustrated simply as follows: 
Worker A believes a teacher should keep her pu- 
pils absolutely quiet during class time; Worker 
B believes pupils should be permitted consider- 
able freedom. Such disagreements ( and agree- 
ments) on the professional role of the good teach- 


er constitute professional distance. As one in- 
creases the number of disagreements between 
any two professional workers, the professional 
distance increases. The greater the frequen- 
cies, the longer the professional distance. 

The second aspect of the definition of pro- 
fessional distance suggests the measurement 
of the degree of divergency between the points of 
view of two professional workers. This aspect 
may be illustrated by Worker A, stating that she 
definitely believe a good teacher stands in front 
of the class when teaching; Worker B says she 
has no convictions on this issue, and Worker C 
says she definitely believes a good teacher stands 
in the rear of the room. Such a disagreement 
suggests that the professional distance is longer 
between Workers A and C than it is between A 
and Bor BandC. The more divergent the 
points of view, the longer the professional dis- 
tance; the less divergent the points of view, the 
shorter the professional distance. 

Definition of the Professional Role of the 
Good Teacher. —The concept of the profession- 
al role of the good teacher also is an adaptation 
from the field of sociology. It is adapted from 
the term ‘‘social role’’ which Cuber defines as 
‘the culturally defined patterns of behavior ex- 
pected or required of persons in specific social 
positions. ... behavior as used in this definition 
includes... . both overt acts and covert behaviors 
such as attitudes, values, and ideas. ’’ (10:232) 
Professional role may be similarly defined as 
the professionally determined behaviors expect- 
ed or required of persons in a sepcific profes - 
sional position, i.e., the position of classroom 
teacher. The role of teacher requires both co- 
vert and overt behaviors. In general, the covert 
behaviors may include desiring to teach, know- 
ing subject matter, and believing in democracy. 
The overt behaviors are conducting lessons, 
manipulating teaching tools, and preparing re~ 


ports. 
When defining the behaviors required of per- 
sons playing the professional role of the good 


“From the author's Ph.D. dissertation, University of Wisconsin, 19525 A. S. Barr, advisor. ; 


SECTION I 


-Asq ]19UN0D 
s3uney jidng 


sepripn 
s3uney Arostasedng 


S}saL 
Sulssog 
Sutssog sapein 


-ASq 
souor “44 §) Sapeiy 
souor @109§ BuTyoval 


< 
: 


$3 
8 
| 


September, 1955) 


teacher in contrast to the professional role of 
the poor teacher, one synthesizes all the best 
illustrations of ‘‘good’’ teaching he has seen or 
heard or read about. Teacher practices such as 
keeping the children absolutely quiet during class 
time, having them fold their hands while they lis- 
ten to the teacher, and measuring the results of 
learning exclusively with standardized tests may 
be learned as ‘‘good”’ activities. On the other 
hand, dividing the class into small groups to 
work on individually chosen tasks, permitting as 
much freedom as possible, and measuring the re- 
sults of learning by observing coope rative be- 
havior patterns may be learned as ‘‘good’’ be - 
haviors by a second professional worker who 
may be a rater of teachers. Likewise the per- 
sonal traits that ‘‘good’’ teachers have, the mod- 
ulated voice, the social poise, the ethical stand- 
ards, all are learned by each professional work- 
er to form his own concept of the ‘‘good’’ teacher 
and required of any person who would play the 
role of his ‘‘good’’ teacher. 


How Concepts of the Professional Role of the 
Good Teacher Functions in Teacher Evaluation. 


— ‘‘Appraisal of any kind may be defined as an 
act of judgment, in which the judging implies 
both a criterion—a standard of some sort—and 
a pertinent description of what is being j udged.’’ 
(18:172) In the field of teacher evaluation the 
criterion is one’s concept of the professional 
role of the good teacher or some aspect of it; 
the pertinent description is a concept of the teach- 
er being judged, or some aspect of her perform- 
ance in the role of teacher. When a teacher 
evaluates herself, she compares what she 
thinks she is to what she thinks she should be, 
i.e., her concept of the professional role of the 
good teacher. As a result of her comparison, 
she arrives at a qualitative and/or quantitative 
expression representing the distance between 
her two concepts. When evaluations are made 
by a person other than the teacher, the evalu- 
ator compares his concept of the teacher’s pe r- 
formance with his concept of the professional 
role of his ‘‘good’’ teacher. His comparison al- 
so results in a qualitative and/or quantitative 
expression representing the distance between 
his two concepts. From this point of view, all 
measurement can be thought of as expressions 
of distance between the criterion and the con- 
cept of what is being evaluated. Concepts of the 
professional role of the good teacher functionas 
the criterion in teacher evaluation. 


How Concepts of Professional Distance Func- 
tion in Teacher Evaluation— nprofessiona 
distance—that is, disagreements between two 
professional workers on what constitutes the pro- 


GROTKE 


fessional role of the good teacher—-exists be- 
tween the teacher and her evaluator, their evalu- 
ations are likely to be different because their 

criteria are different. Doing an excellent job of 


teaching in the eyes of the teacher is approximat- 


ing her own concept of good teaching. If her con- 
cept of good teaching is decidedly different {rom 
the concept of good teaching held by her rater, 
i.e., the professional distance between them is 
long, the evaluation that the rater may give her 
is apt to be poor. If, on the other hand, her con- 
cept of good teaching is similar to that of her rat- 
er, i.e., the professional distance between them 
is short, the evaluation that the rater may give 
her is apt to be good. Thus it is hypothesized 
that the length of professional distance increases 
as teacher ratings decrease from good toaverage 
and from average to poor, 

he Measurement of Professional Distance 
—Professional distance suggests the comparison 
of the concepts of the professional role of the good 
teacher as held by any two professional workers. 
To make such comparisons instruments were con- 
structed to ascertain the overt and covert be hav- 
iors each professional worker expects from his 
‘‘good’’ teacher, Specific teaching practices, 
teacher factors, and beliefs related to education 
were selected to appear on the instruments, Sub- 
jects were asked to respond by classifying each 
practice as good or poor; each factor as impor t- 
ant or in significant; and each statement of belief 
as ones which they definitely believe or ones that 
they definitely do not believe. Step intervals 
were provided for indicating in between positions. 
Since distances between the teachers and their rat- 
er were sought, comparisons were made between 
the responses of the teacher and the responses 
of their raters. For each item on each instru- 
ment the distance between the two responses 
was assigned a weight value, To arrive at the 
total distance measured by the instrument, the 
weight values for all the items of that instru- 
ment were summed, * 

Conditions Under Which the Hypothest: Will be 
Considered Substantiated—If the professional 
distance scores are lowest for the teach- 
ers rated good, and higher for the teachers rat- 
ed average, and highest for the teachers rated 
poor, then the hypothesis will be considered to 
be substantiated, and professional distance as 
measured by these instruments may be consid- 
ered as an indicator of professional ratings. If 
professional distance scores appear insome oth- 
er pattern, the hypothesis will be considered as 
not supported by the evidence. 

ummary—This study attempts toshow the 
pattern of the lengths of professional distance ar 
they exist between raters and the teachers they 


#*Instruments used in this study are described in detail in Section III. Copies of them appear as Apen- 
dices A through © which will be found in original thesis on file in the Library, University of Wiscon- 
sin, Madison, Wisconsin. Procedures for measuring professional distances are described in Section IV. 


JOURNAL OF EXPERIMENTAL EDUCATION 


rate. Professional distance is definedas the 
number and divergency of the disagreements be- 
tween the concepts held by two professional work- 
ers on what constitutes the professional role of 
the good teacher. Each worker learns from his 
own unique sequence of experiences his concept 
of the professional role of his ‘‘good’’ teacher. 
One's concept of the professional role of the good 
teacher is used as a criterion to evaluate one’s 
own teaching and the teaching of others. When 
rating others, the resultant evaluations proba- 
bly vary from good to poor as professional dis - 
tances vary from short to long. Special instru- 
ments and procedures are used to measure pr o- 
fessional distance. 


SECTION I 


The Method of Research 


A MODIFIED form of the casual-compar- 
ative method of research was employed in this 
study. Two phenomena were investigated: one, 
a teacher considered a good teacher; the other, 
a teacher considered a poor teacher. The first 
modification recognized a middle group, called 
average, and believed to be between the two ex- 
tremes.* Therefore, the absence of the first 
phenomenon was teachers rated average or poor; 
the absence of the second phenomenon was teach- 
ers rated average or good. The second modifi- 
cation assumed that circumstances attending the 
presence of the phenomena may exist indegrees, 
i.e., lengths of professional distance. 

Some arbitrary limits were made for the 
study. All subjects were selected from elemen- 
tary school faculties. Teachers in the group 
studied taught between grades one and six. 
Whether the relationship stated in the hypothe - 
sis exists among junior high and senior high 
school faculties is not a part of the study. An- 
other limitation was made by definition. The 
professional role of the teacher was limited to 
a rather arbitrary set of teaching practices, 
teacher factors, and beliefs related to educa- 
tion. While each of the items seemed reason- 
able at the time of adoption, possibly other it- 
ems and instruments for detecting other areas 
of disagreement between professional workers 
may be had. 

Application of the Plan of Research. — The 
school systems of two communities were select- 
ed for the study. The first community with a 
population of approximately 100, 000 was located 
on the coast of the Gulf of Mexico. When the 
study started in this community, there were 19 


“Lanke (19), a factor 
sarily 


age are not neces 


(Vol. 24 


elementary schools for Anglo~ and LatinAmer- 
ican children. Two of the 19 schools were not 
accepted for the study: one had been establish- 
ed only a few weeks before the study was begun; 
the second was staffed by teachers who the 
principal felt could not be considered as either 
good or poor. Of the 17 schools selected, 15 
were administered by their own principals. The 
other two were administered by one principal 
who felt competent to rate the teachers in both 
schools. 

The second community had a population of 
approximately 70, 000 and was located on the 
coast of Lake Michigan. Of the 14 elementary 
grade schools, 13 were selected for the study. 
One was not accepted because the teachers 
failed to cooperate. Among the 13 accepted, 
one elementary school was housed in a building 
that also housed classes for orthopedic and 
mentally handicapped children. The prin- 
cipal in this elementary school was administrat- 
or for all divisions in his building. Two ele- 
mentary schools were housed in buildings along 
with junior high schools. - The principals of the 
elementary schools were also principals of the 
junior high schools. Six small schools weread- 
ministered by three principals, each principal 
serving as the head of two schools. Each of 
these three principals felt competent to rate 
the faculties in each of his schools, Each of 
the other schools was administered by its own 
principal. With the schools from the first com- 
munity, the total group for study consisted of 
30 elementary school faculties administer- 
ed by 26 principals. 

The principals of each of the 30 schools 
served as the raters of their teachers. Facul- 
ties of the schools ranged from 6 to 47 teachers. 
Each principal was asked to select from his 
staff(s) one of the best teachers, one of his av- 
erage teachers, and one of his poor or ineffec- 
tive teachers. These directions were employ- 
ed so that he would use his own criteria inmak- 
ing his judgments. The selection of one good, 
one average, and one poor teacher was employ- 
ed to secure a spread of his ratings. There is 
no Claim that the teachers were selected onthe 
same basis, nor that the teachers making up 
any one classification have anything in common 
other than their own principal’s rating. With 
the selection of teachers, the subjects for the 
study consisted of 26 raters of teachers, 30 
teachers rated good, 30 teachers rated aver- 
age, and 30 teachers rated poor. 

All subjects responded to the data gathering 
devices presented to them. On the first instru- 
ment the subjects classified controversial teach- 


analysis technique, was led to believe that perhaps teachers considered aver- 
in between the two extremes. 


September, 1955) 


er practices as ‘‘good’’, ‘‘poor’’, or ‘‘makes 
no difference’’. On the second instrument the 
subjects classified teacher factors ona five 
point scale from ‘‘of utmost importance’’ to 
‘‘insignificant’’. On the last instrument the 
subjects indicated their pattern of beliefs relat- 
ed to education. The instruments are described 
in detail in Section II. 

In each case the subject’s cooperation was 
asked and received. None of the group was told 
the hypothesis being studied. Sufficient time 
was given to permit each person to respond at 
his leisure. Instructions on the mthod of re - 
Sponse appeared on each measuring device. The 
qualifications of the persons and their responses 
suggested that the instruments were not misin- 
terpreted. When omissions were considered 
oversights, the subjects were asked to complete 
the instrument. Omissions of one principaland 
three teachers, however, were due to a differ- 
ence in point of view. In these cases it was in- 
ferred from their comments on the margin that 
they considered the items they omitted as ‘‘not 
making any difference’’. Their omitted respon- 
ses were considered as such. These were few 
in number. One requirement specified that per- 
sons responding to the instruments would not 
converse about the study before or during 
the data collecting. Upon collecting the instru- 
ments from the schools, information on compli- 
ance with this request was asked. Responses 
indicated cooperation. 

The study sought the relationship of profes- 
sional distance to teachers’ ratings. To deter- 
mine professional distance, the responses of 
each rater were compared with those of the 
teachers he rated. Weight values were assigned 
to their disagreements. Three analyses of the 
assigned weights were made. Professional dis- 
tance scores were computed by summing the as- 
signed weights. Frequencies of disagreement 
scores were computed by counting the number 
of assigned weights. Item analyses were made 
by computing the professional distance and fre- 
quency of disagreement for each group of teach- 
ers for each item. Scores were compared to 
determine whether they substantiated the hypoth- 
esis. 


SECTION III 
easur trume 


THREE DATA gathering devices were 
constructed and used in the study. They all 
sought to determine the subjects’ concepts of 
the professional role of the good teacher, in 
terms of teacher factors, teaching practices, 
and beliefs related to education. * 


GROTKE 


The Evaluation of Teaching Practices.—T he 


first of the three instruments dealt with teaching 
practices. Fifty-one of the practices appearing 
on the instrument were extracted from Table XLI, 
A Summary of Theory and Practice in Teaching 
Social Science, in A.S. Barr’s Characteristic 
Differences in the Performance of Good and Poor 
Teachers of the Social Studies, (2:100f) The table 
includes data on the number of experts who co n- 
sider the practice as good and also the number of 
experts who consider the practice as poor, In the 
construction of the instrument only those prac- 
tices were selected on which the experts showed 
a marked degree of disagreement. Those prac- 
tices on which the minority group of experts equal- 
led ten or more percent of the majority group 
made up the first 51 items for the instrument. 
To this number of items were added the following 
two, which seemed to be controversial: 


Measures results of learning by changed 
attitudes and behaviors; and 

Measures results of learning by quality of 
pupils’ projects and exercises. 


Subjects were asked to categorize each of the 
total of 53 practices as ‘‘good’’, as ‘‘poor’’, or 
as ‘‘making no difference’’, i.e., neither good 
nor poor, The various methods of ‘‘scoring"’ 
the instrument are described in the following 
chapter, Analysis of Data. 

The Evaluation of Teacher Factors.— An in- 
strument to obtain the subjects’ ranking of the 
importance of specific teacher factors was con- 
structed by using the teacher factors that appear 
on the official rating scale used by the school 
system in the first community studied, The auth- 
or of this study was also the author of the rating 
scale. Twenty-five teacher factors appearing 
on both devices are divided into four classifica- 
tions; (1) the teacher asaperson; (2) the 
teacher as a director of learning; (3) the teacher 
as a friend and counselor of students; and (4) the 
teacher as a member of a profeseional staff, 
Such items as ‘‘ physically fit’’, ‘‘emotion- 
al stability’’ and ‘‘good speaking voice’’ appeared 
in the first classification. ‘‘Establishes atta in- 
able goals cooperatively with students’’, Has mas~ 
tery of subject matter’’, and Skillful with a vari- 
ety of tests and measurement devices’’ appeared 
in the second classification. ‘‘Builds a sense of 
security and personal worth in all students’’, and 
‘“‘Considers the development of the child as an in- 
dividual more important than subject matter mas~- 
tery’’ appeared in the third. ‘‘Guided by profes~ 
sional ethics’’ and ‘‘Actively cooperates in staff 
operations’’ appeared in the last classification. 
The subjects were asked to categorize eachof the 
25 factors into one of five categories: (1) utmost 


“Copies of instruments will be found in Appendices A, B, C, in original thesis, Library of the Uni- 
versity of Wisconsin. 


6 JOURNAL OF EXPERIMENTAL EDUCATION (Vol. 24 


major importance, (2) very important, (3) impor- 
tant, (4) usable, but not important, and (5) in- 
Significant. It was assumed that those factors 
deemed important were those that the subject 
required of the person who plays the role of his 
good teacher, The ‘‘scoring’’ techniques used 
for this instrument are described in Section V. 

The Inventory of Beliefs. — An instrument 
to determine which beliefs related to education 
were held by the subjects was constructed for 
this research, It consisted of 12 groaps of state- 
ments of beliefs. Each group contained 10state- 
ments. The names of the groups were Teacher- 
Pupil Relationships, Teaching Profession, Com- 
munity Relationships, Objectives of Education, 
The Schools’ Stand on Controversiallssues, Mi- 
nority Groups, Democracy and Government, Ec~- 
onomic Problems, Organized Labor, Religion, 
and Life Values. Each of the 120 statements be- 
gan with the words, ‘‘I believe that....’’ Illustra- 
tions of the items are: 


I believe that the public schools have an 
obligation to provide sex education. 

I believe that the teachers who actively 
work for social and economic reforms are 
poorer teachers than those who stick to their 
own subject matter fields. 

I believe that another world war will come 
eventually, regardless of the steps we take to 
prevent it. 

I believe that it is un-American to peace- 
fully advocate that the American Government 
should operate all steel, mining, transpor ta - 
tion, and manufacturing industries. 

I believe that every person will set aside 
his principles when the rewards for doing so 


are high enough. 


Subjects were asked to indicate on a special 
answer sheet their reactions to the statements as 
one of the following: (1) Yes, I definitely believe 
this statement; (2) I am inclined to believe 
this statement; (3) I cannot say whether I believe 
this statement or not because I have not made up 
my mind; (4) Iam inclined not to believe this 
statement; and (5) No, I definitely do not believe 
this statement. On this instrument it was 
assumed that the subject’s own belief was the one 
he required for the professional role of his good 
teacher. 

The reliability of the inventory was studied by 
means of a test-re-test procedure. Thirty-four 
members of a class in Teacher Supervision reé~ 
sponded to the inventory on two occasions with 
one week intervening. Analysis of their re- 
sponses found that: 


The average number of the 120 items on 
which the class gave identical answers a week 
later was 73. 8. 

The average number of the 120 items on 


which the class members reversed them 
selves was 15.8. 


‘‘Reversing themselves’’ was defined as indicat- 
ing ‘‘Definitely believing "’ or ‘‘Inclined to be~ 
lieve’’ during one responding period and indicat- 
ing ‘‘Definitely not believing’’ or ‘‘Inclined not to 
believe’’ the same item during the other respond- 
ing period. Changes in response may be attrib- 
uted to the effectiveness of the instructions dur- 
ing the intervening week, or to the degree of re- 
liability of the instrument. Methods for ‘‘scor- 
ing’’ the instrument are described ina later 
Section of this report. 

Summary. —The three data gathering de- 
vices used in this study have been desc ribed. 
They are (1) the Evaluation of Teaching Prac- 
tices, (2) the Evaluation of Teacher Factors, 
and (3) the Inventory of Beliefs. All sought 
the subjects’ concept of the professional role of 
his good teacher. Methods for ‘‘scoring’’ the 
instruments are described in a later Section of 
this report. 


1 lus 


AS HAS already been said, a modified 
form of the causal-comparative method of re- 
search was employed. It was hypothesized that 
the lengths of professional distances increase 
as teacher ratings decrease from good to aver- 
age and from average to poor. Professional 
distance is suggested by the frequency and di- 
vergency of disagreements between the points 
of view on what constitutes the professional role 
of the good teacher. The greater the frequency 
of disagreements, the greater the professional 
distance; the greater the divergency of disa- 
greements, the greater the professional dis - 
tance. 

Professional distance ‘‘scores’’ are com- 
puted for the number and divergency of disa- 
greements between each rater and the teachers 
he rated. Scores are associated with the 
teachers, suchas; The score of the teacher 
rated good is 145. Actually, the score is not 
the teacher’s any more than it is the rater’s, 
since it signifies the extent of the disagree- 
ments between them. However, for conven- 
ience, throughout this discussion, the rather 
lengthy expression, ‘‘the score for the distance 
between the rater and theteacher he rated 
good’’; is abbreviated to ‘‘A’s score’. Like- 
wise, ‘‘the score for the distance between the 
rater and the teacher he ratedaverage’’ is ‘‘B’s 
score’’, and ‘‘the score for the distance be- 
tween rater and the teacher he rated pcor’’ 
is ‘‘C’s score’’. 

If the professional distance scores are low- 


TABLE 
PROFESSIONAL DIS1 ANCE SCORES FOR TEACHER PRACTICES 


Code No. Teacher Teacher Teacher 
School Rated Good Rated Average Rated Poor 


22 30 18 
28 17 24 
21 30 30 
28 36 36 
36 36 24 
25 26 27 
30 42 40 
30 35 36 
10 26 44 
47 30 51 
22 27 34 
33 39 32 
30 31 33 
16 25 19 
37 29 31 
25 29 32 
34 34 29 
24 32 42 
27 25 35 
34 29 26 
38 42 37 
25 34 

23 29 

39 24 

27 22 

33 30 

24 35 

27 18 

22 33 

26 28 


1. 
2. 
3. 
4. 
5. 
6. 
7. 
8 

9. 


*Code numbers were assigned to each school to assure their anonymity. 
It may be reported, however, that the numbers 1 through 17 represent 
the first community studied, and 21 through 34 represent the second 
community. The school in the second community that was to be desig- 
nated number 29 was eliminated for reasons explained in Section I 
under the heading ‘‘Application of the Plan of Research’’. 


September, 1955) GROTKE 7 
10. 
11. 
12. 
13. 
14. 
15. 
16. 
17. 
21. 
22. 
23. 
24. 
25. 
26. 
27. 
28. 
30. 
31. 
32. 
33. 
34. 


JOURNAL OF EXPERIMENTAL EDUCATION 


est for the A teachers, higher for the B teach- 
ers, and highest for the C teachers, then the 
hypothesis will be considered as supported by 
the evidence. If the professional distance 
scores appear in some other pattern, then the 
hypothesis will be considered as not supported 
by the evidence, 

This section is divided into three parts. Part 
One reports the analysis of data for profession- 
al distance (frequency and divergency of disa- 
greements), Part Two reports an analysis for 
frequency of disagreements, without regard for 
their divergency. Part Three reports on item 
analysis for critical items. In each part the 
three instruments are analyzed separately. 


rt ess l tance 


Teacher Practices, —The instrument for 
measuring professional distances for teacher 
practices consisted of 53 items which the sub- 
jects classified as ‘‘good’’, ‘‘poor’’, or ‘‘makes 
no difference’’, i.e., neither good nor poor. 
The responses of each teacher were com pared 
with those of her rater. Differences in their re- 
sponses were assigned the following weights: 


Weight of 2: One professional worker classify- 


ing the practice as good; the other 
worker classifying it as poor. 

Weight of 1: One professional worker classify- 
ing a practice as making no differ- 
ence; the other classifying it as 
good or as poor, 


Professional distance scores for teacher prac- 
tices were computed by summing the assigned 
weights. Scores for A, B, and C teachers are 
shown in Table II. 

Professional distance is shorter for A teach- 
ers than for either B or C teachers in 16 of the 
30 schools, and longer for C teachers than for 
either A or B teachers in 13 of the 30 schools. 
In 20 schools it is shorter for A teachers than 
for C teachers. 

Inventory of Beliefs. —The instrument to 
measure professional distance on beliefs in and 
related to education consisted of 120 statements 
to which the subjects selected one of the five fol- 
lowing responses to be their answer: (1) Yes, I 
definitely believe this statement; (2) I am in- 
clined to believe this statement; (3) I cannot say; 
(4) Iam inclined not to believe this statement; 
or (5) No, I definitely do not believe this state- 
ment. The responses of each teacher were com- 
pared with those of her rater. The following 
weights were assigned to the differences between 
their responses to any one statement: 


Weight of 4: One party definitely believing; the 
other, definitely not believing. 


(Vol. 24 


Weight of 3: One party definitely believing; the 
other, inclined not to believe. 

Weight of 3: One party definitely not believing; 
the other, inclined to believe. 

Weight of 2: One party inclined to believe; the 
other, inclined not to believe. 

Weight of 2: One party responding that he can- 
not say; the other, either definite- 
ly believing, or definitely not be- 
lieving. 

Weight of 1: One party definitely believing; the 
other, inclined to believe. 

Weight of 1: One party definitely not believing; 
the other, inclined to not believe. 

Weight of 1: One party responding that he can- 
not say; the other, either inclined 
to believe, or inclined to not be - 
lieve. 


The assigned weights were summed to deter - 
mine the professional distance score for beliefs. 
These are shown in Table II. 

Professional distance for beliefs is shorter 
for A teachers than for either B or C teachers 
in 11 schools of the 30 studied. It is longer for 
C teachers than for either A or B teachersin14 
schools. In 19 schools, professional distance 
for A teachers is shorter than that for C teach- 
ers. 

A second analysis of the belief inventory was 
made in which only assigned weights for differ- 
ences that suggested opposition were s ummed. 
Such differences were those assigned weights of 
2 in which one professional worker was inclined 
to believe and the other worker was inclined not 
to believe, The resultant scores, representing 
Oppositional Professional Distance for Beliefs, 
are shown in Table IV. 

Oppositional professional distance for beliefs 
is shorter for A teachers than for either B orC 
teachers in 14 schools. It is longer for C teach- 
ers than for either A or B teachers in17 schools. 
In 19 schools it is shorter for the A teachers 
than it is for the C teachers. 

Teacher Factors. ~The instrument to meas- 
ure professional distance for teacher factors 
consisted of 25 items found on the teacher rat- 
ing scale of the first community studied. Sub- 
jects were asked to rank the importance of each 
factor on the following five-point scale: (1) of 
utmost importance; (2) very important; (3) im- 
portant; (4) usable, but not important; (5) insig- 
nificant. The responses of each teacher were 
compared with those of her rater, and the dif- 
ferences between each of their responses were 
assigned the following weights: 


Weight of 4: One party ranking the factor of ut- 
most importance; the other, as in- 
significant. 

Weight of 3: One party ranking a factor as ut- 


TABLE Ill 
PROFESSIONAL DISTANCE SCORES FOR BELIEFS 


Teacher Teacher 
Rated Average Rated Poor 


195 210 
140 140 
129 © 168 
171 179 
179 ° 201 
155 175 
158 138 
149 16 

172 20 

179 207 
173 186 
176 159 
133 142 
114 143 
197 176 
160 182 
174 165 
174 187 
176 194 
199 155 
215 193 
183 189 
179 159 
178 148 
174 193 
185 131 
17) 161 
1.7 201 
181 176 
149 142 


September, 1955) GROTKE 9 
“Schoo! ico 

School Rated Good 

171 
167 

148 
127 
157 
181 
176 
211 

9. 180 
10. 185 
11. 146 
12, 164 
13. 143 
14. 114 
15. 140 
16. 161 
17. 182 
21. 165 
22. 190 
23. 145 
24, 229 
25. 188 
26. 157 
27. 139 
28. 102 
30. 136 
31. 179 
32. 153 
33. 165 
34. 162 


JOURNAL OF EXPERIMENTAL EDUCATION 


TABLE IV 
OPPOSITIONAL DISTANCE SCORES FOR BELIEFS 


Teacher 
Rated Good 


1. 
2. 
3. 
4. 
5. 
6 

7. 
8. 
9. 


10 (Vol. 24 
School Teacher Teacher 
Code No. po Rated Average Rated Poor 
99 133 162 
102 83 78 
93 60 115 
78 102 120 
75 84 109 
105 92 127 
130 124 88 
158 78 70 
101 79 111 
10. 155 161 187 
11. 92 101 119 
12. 112 105 97 
13. 104 86 92 
14, 50 64 65 
15. 104 156 138 
16. 97 101 124 
17. 126 136 156 
21. 100 118 123 
22. 158 99 150 
23. 84 139 92 
24. 203 129 147 
25. 160 109 167 
26. 113 156 90 
27. 73 78 91 
28. ' §3 124 139 
30. ae 130 75 
31. 118 113 78 
32. 131 101 176 
33. ' 103 125 133 
34. 107 97 79 


September, 1955) 


most importance; the other, us- 
able but not important. 

Weight of 3: One party ranking the factor very 
important; the other, insignificant. 

Weight of 2: One party ranking the factor as 
important; the other, either of ut- 
most importance, or insignificant. 

Weight of 2: One party raning the factor as very 
important; the other, usable but 
not important. 

Weight of 1: One party ranking the factor very 
important; the other, either of ut- 
most importance, or important. 

Weight of 1: One party ranking the factor as us- 
able but not important; the other, 
either important, or insignificant. 


The assigned weights for the differences between 
the responses were summed to arrive at a pro- 
fessional distance score for teacher factors. 
These scores are shown in Table V.. 

Professional distance for teacher factors is 
shorter for A teachers than for either B or C 
teachers in 13 of the schools studied. It is long- 
er for C teachers than for either A or B teach- 
ers in 8 schools. In 16 schools, professional 
distance is shorter for A teachers than it is for 
C teachers. 

A second type of analysis, was made of the 
assigned weights. First, the assigned weights 
were marked plus (+) if the teacher ranked the 
factor as more important than did her rater. 
All other weights were marked minus (-). Sec- 
ond, the plus and minus weights were summed 
algebraically to arrive at a Compensated Score 
of Professional Distance for Teacher Factors. 
The assumption for such an analysis was that a 
teacher would not be rated lower if she thought 
a teacher factor less important than her rater 
did, provided that she thought some other fac - 
tor more important than did her rater. Since 
plus and minus values were summed algebraic- 
ally, Compensated Scores could be zero (0),a 
positive quantity, or a negative quantity. In in- 
terpreting such scores zero would suggest the 
absence of professional distance. Direction of 
professional distance would be indicated by the 
sign: positive scores suggest that the teacher 
classifies factors as more important than her 
rater; negative scores suggest that she clas si- 
fies them as less important than her rater. 
Length of professional distance is indicated by 
the integer. In comparing two scores for length 
of professional distance and one is negative, on- 
ly the integers are compared. Thus, in school 
number 34 the A teacher’s score of plus 5 is con- 
sidered to be a shorter professional distance 
than the C teacher’s score of minus 7. C ompen- 
sated Scores are shown in Tabel VI. 

Professional distance, measured by such an 
analysis is shorter for A teachers than either B 


GROTKE 


or C teachers in 12 of the schools studied. It 

is longer for C teachers than for either A or B 
teachers in 12 schools. In 18 schools the A 

teacher’s professional distance is shorter than 
the C teacher's. 

A third analysis was made of the assigned 
weights. Plus and minus signs were added sim- 
ilarly to the method applied in the second analy- 
sis, but then only the negative values were sum-~- 
med to arrive ata Less Than Score. The as- 
sumptions were that no compensation factor op- 
erated in any teacher being considered good, 
average, or poor; that her thinking a factor 
more important than her rater’s opinion of the 
same factor has no bearing on her rating as 
a teacher; and that only her thinking factors to 
be less important than her rater’s opinion of 
them bears on her rating as a teacher. Less 
Than Scores are shown in Table VII. In inter- 
preting these scores, zero suggests the absence 
of professional distance, and the higher the score 
the greater the professional distance, 

Such an analysis indicates that professional 
distance is shorter for the A teacher than for 
either the B or C teacher in 9 of the 30 schools 
studied, It is longer for C teachers than for A 
or B teachers in 11 schools. In 18 schools pro- 
fessional distance for A teachers is shorter than 
for C teachers. 

Summary of Analyses for Professional Dis~ 
tance. Professional Distance scores were 
computed according to a variety of described 
procedures for each of the three instruments. 
For each procedure (1) the number of schools in 
which the A teacher’s professional distance 
score was lower than either the B or C teacher, 
(2) the number of schools in which the C teach- 
er’s professional distance score was higher than 
either the A or B teachers’ score, and (3) the 
number of schools in which the A teacher's 
score was lower than the C teacher’s score 
was found. These are summarized in Table 
Vil. 

Conclusion. The data do not completely 
support the hypothesis stated earlier. The short- 
est professional distance is not always between 
the rater and the teacher he rates good, nor is 
it always longest between the rater and the teach- 
er he rates poor. Depending upon the method of 
analysis and the instrument, the number of 
schools in which the A teacher’s score is lowest 
varies from 9 to 16. Similarly, the number of 
schools in which the C teacher’s score is the 
highest varies from 8 to 17. The number of 
schools in which the A teacher's score is less 
than the C teacher’s score varies between 16 and 
20, depending on the method of analysis and the 
instrument. 

Apparently, the behaviors required by the 
teachers’ raters for persons performing the role 
of their good teachers function in a more com- 


JOURNAL OF EXPERIMENTAL EDUCATION (Vol. 24 


TABLE V 
PROFESSIONAL DISTANCE SCORES FOR TEACHER FACTORS 


School 
Code No, 


1. 
2 
3 
4. 
5. 
6. 
7. 
8. 
9. 
10. 
11. 
12 
13 
14 


— 


12 
P| Teacher Teacher Teacher 

Rated Good Rated Average Rated Poor 
19 35 26 
16 30 29 
9 19 14 
i 19 17 21 
25 31 22 
23 32 20 
22 29 17 
28 16 21 
28 19 19 
25 26 26 
21 30 40 
18 19 18 
19 15 17 
18 19 17 
14 17 13 
_ 8 23 22 
12 21 16 
16 18 17 
15 18 11 
14 18 17 
18 25 11 
11 13 18 
12 13 17 
16 17 20 
16 15 22 
16 18 13 
16 18 25 
18 22 18 
26 31 20 
34. 15 15 16 ‘ 


TABLE VI 


COMPENSATED SCORES OF PROFESSIONAL DISTANCE FOR 
TEACHER FACTORS 


Teacher 
Rated Average 


33 
30 
- 9 

1 
31 
-28 
29 
- 6 

1 
26 
30 
“11 
15 

3 
- 6 
- 3 
-15 
- 8 


5 
6 
7 
8 
9. 
10. 
11. 


et ' 
= 


— 


oS awe 


September, 1955) GROTKE 13 
School Teacher P| Teacher 
Code No. Rated Good Rated Poor 
22 
0 25 
-1 -4 
ll 21 
25 18 
-18 
18 - 9 
28 19 
28 - § 
21 -4 \ 
21 40 
12. -12 -16 
13. 17 -1 
14. 16 ll { 
15. - 8 °i 
16. 6 
17. 1 
21. 
22. 
23. 
24. 
25. 
26. 
27. 
28. 
30. 
31. 
32. 
33. 
34. 


JOURNAL OF EXPERIMENTAL EDUCATION (Vol. 24 


TABLE VII 


LESS THAN SCORES OF PROFESSIONAL DISTANCE FOR 
TEACHER FACTORS 


School Teacher Teacher Teacher 
Code No. Rated Good Rated Average Rated Poor 
1. 5 1 2 
2. 8 0 2 
3. 5 14 9 
4. 4 8 0 
5. 0 0 2 
6. 16 30 19 
we 2 0 13 
8. 0 13 1 
9. 0 9 12 
10. 2 0 15 
il. 0 0 0 
12. 15 15 17 
13. 1 0 9 
14. 1 8 3 
15. 11 12 7 
16. 2 13 19 
17. 3 18 8 
21. 12 13 8 
22. 6 12 1 
23. 9 14 11 
24. 17 23 6 
25. 6 5 4 
26. 10 9 13 
27. 8 7 8 
28. 0 2 1 
30. 7 3 10 
31. 9 1 0 
32. 7 8 5 
33. 2 2 3 
34. 5 5 13 


September, 1955) 


UT STOOYDS JO “ON UT STOOYDS Jo “ON UT STOOYOS Jo “ON 


FONVLSIC TYNOISSa40Ud YOd SASATYNV 40 AUVWWNS 
TMA 


GROTKE 15 
a |23 


16 JOURNAL OF EXPERIMENTAL EDUCATION 


plex manner. Behaviors required by the raters 
may be classified into two or more classifica- 
tions. One classification may be considered 
‘‘core’’ behaviors, on which from the rater’s 
point of view there is no controversy, and on 
which agreement is necessary for a teacher to 
be considered good by him. A second classifi- 
cation may be considered ‘‘peripheral’’ behav- 
iors, on which there exists an unresolved con- 
troversy and on which differences in points of 
view may be understood and accepted. The in- 
struments used in this study are loaded with 
items from the second classification, since con- 
troversial items were sought for them. It may 
be that instruments designed to require the rat- 
er to (1) state his position, and (2) state whether 
he would accept alternate behaviors, would pos- 
sibly measure professional distance more pre- 
cisely. 

Another factor seems to be operating in the 
measurement of teachers, People do not dis- 
agree with one another with equal amounts of 
tact. It seems possible that a teacher who holds 
a point of view quite distant from that of her rat- 
er, may compensate for the possible conflict 
by being quite tactful about their differences. 
On the other hand, a teacher disagreeing on only 
a few issues may do so, so untactfully that a 
disproportionate weight is placed on her diver- 
gent points of view. 

A third factor may help explain the findings. 
Apparently, some raters of teachers are more 
tolerant than others, Raters of teachers who 
are tolerant of conflicting points of view may 
accept frequent and divergent opinions and not 
permit them to affect their ratings. Onthe other 
hand, raters of teachers who are intolerant of 
conflicting points of view may base their ratings, 
in part, on teachers’ non-acceptance of their 
points of view. These are only three sugges- 
tions that may clarify the findings. 


Part Two; Frequency of Disagreements 


Disagreements between raters and the teach- 
ers they rated were next analyzed without re - 
gard for the amounts of their divergencies. It 
was hypothesized that C teachers disagree with 
their raters more frequently than do B teachers, 
who in turn disagree with their raters more fre- 
quently than do A teachers. Many types of dis- 
agreements were found. 

Procedures for analyzing each instrume nt 
are reported separately. The frequencies 
of each type of disagreement along with partial 
and complete totals of disagreements for each 
instrument are shown in Tables [IX through XII. 
A summary of these tables, indicating the num- 
ber of schools in which (1) the A teachers dis - 
agree less frequently than do either the B or 
C teachers, (2) the C teachers disagree more 


(Vol. 24 


frequently than do either the A or B teachers, 
and (3) the A teachers disagree less frequently 
than do the C teachers, appears in Table XIII. 

Teaching Practices. —Two types of differ - 
ences were recognized in analyzing the respon- 
ses to this instrument. When one professional 
worker classified the practice as ‘‘good’’ and 
the other worker classified it as ‘‘poor’’, the 
difference was considered an oppositional differ- 
ence. All other disagreements were non-oppo- 
sitional differences. Frequencies for opposi- 
tional, non-oppositional, and total differences 
for Teacher Practices for A, B, and C teachers 
are shown in Table IX. 

Inventory of Beliefs. —Four types of differ- 
ences were recognized in analyzing the respon- 
ses to this instrument and are named and defined 
as: 


Type 1: Non-oppositional differences 

a) One professional worker responding ‘‘can- 
not say, the other indicating any other re- 
sponse, ’’ 

b) One subject responding ‘‘Definitely believ- 
ing’’, the other responding ‘‘Inclined to 
believe’’. 

c) One subject responding ‘‘Definitely not be- 
lieving’’, the other responding ‘‘Inc lined 
not to believe’’. 


Type 2: Mild Opposition 
One subject responding ‘‘Inclined to believe”, 
the other responding ‘‘Inclined not to believe’. 


Type 3: Moderate Opposition 
a) One subject responding ‘‘Definitely believ- 


ing’’, the other responding ‘‘Inclined not 
to believe’’. 

b) One subject responding ‘‘Definitely not be- 
lieving’’, the other responding ‘‘Inclined 
to believe’’. 


Type 4: Strong Opposition 
One subject responding ‘‘Definitely believing’, 
the other responding ‘‘Definitely not believ- 


ing’’. 


Frequencies for each type of disagreement for 
beliefs for A, B, and C teachers are shown in 
Table X. 

Partial and complete totals or frequencies of 
disagreements for beliefs are shown in Table XI. 
Partial totals of types four and three are fre - 
quencies of strong and moderate disagreements. 
Partial totals of types four, three, and twodisa- 
greements are frequencies of oppositional dif- 
ferences. Total differences are the sums ofall 
types of disagreements. 

Teaching Factors. —In the analysis of this 
instrument for type of disagreements, the dif- 
ferences in the responses between the teachers 


September, 1955) 


TABLE Ix 
FREQUENCIES OF TYPES OF DIFFERENCES ON TEACHER PRACTICES 


Frequencies of 


Oppositional Non-oppositional Total 


School __ Differences Differences Differences 
B B Cc A B 


Code No. 
18 18 24 


7 26 12 
22 16 26 
20 28 

8 22 
16 21 
16 29 
19 27 
16 21 
12 21 
23 25 
13 26 

9 20 

18 
24 
21 
25 
27 
22 
24 
40 
29 
24 
16 
16 
18 
31 
15 
24 
23 


> 


— 


1. 
2. 
3. 
4. 
5. 1 
6. 
7. 
8. 
9. 


— 


— 


— 


GROTKE 

Cc 
14 
24 19 

15 26 
18 29 
12 16 
17 22 
24 38 
18 27 
8 27 
10. 17 32 

11. 8 29 
12. ll 22 
13. 10 18 
14, 8 14 
15. 21 27 
16. 15 26 
17. 8 22 
21. 14 33 
22. 13 25 
23. 18 21 
24. 32 34 
25. 17 29 
26. 19 23 
27. 13 20 
28. 9 15 
30. 11 26 
31. 14 22 
32. 15 19 
33. 12 25 
34. 8 31 


bz 
Lal 
eT 
LI 
or 
bt 
61 
Sz 
L 

bz 
6 

st 


wo 


£ 
T 
0 
8 
0 
8 
0 
8 
8 
8 
£ 
0 
0 
L 
6 
8 
6 
9 


JOURNAL OF EXPERIMENTAL EDUCATION 


a 
adh 
jo setouenbery 


YOd SHONTUAAAIG AO SAMAL AO 
xX ZIGVL 


' 
18 ee (Vol. 24 
SASS 
ten Tete 
[| — 
CLASH ACOH MANAG 


TABLE XI 


PARTIAL AND COMPLETE TOTALS OF TYPES OF DIFFERENCES 
FOR BELIEFS 


Totals of 
Types 2,3 and 4 


A B 


30 37 
31-29 
27 
32032 
24036 
35 33 
39034 
42-23 
33028 
45 42 
35 36 
33034 
31-26 
19 27 
45 
33 


5. 
6. 
7. 
8 
9. 


September, 1955) GROTKE 19 
Totals of P| Total of 
School Types 3 and 4 All Types 
Code No. A B Cc Fe Cc A B Cc 
26 37 43 46 79 75 82 
[ 26 20 20 26 81 76 73 
25 14 29 37 77 79 76 
|_| 13 31 31 39 73 86 93 
18 24 32 33 80 91 90 
30 20 33 42 95 75 83 
36 34 21 28 74 60 68 
41 22 19 20 73 72 717 
23 30 33 33 89 95 91 
10. 40 42 50 50 71 58 70 
11. 20 27 37 40 89 97 106 
12. 31 28 27 28 70 90 716 
13. 30 24 23 31 65 66 77 
14. 9 10 15 23 76 74 84 
15. 28 43 39 40 65 82 66 
16. 29 28 33 41 88 88 92 
17. 37 38 40 38 40 40 80 67 47 
21. 27 32 34 34 36 38 81 76 81 
22. 42 28 38 42 32 42 66 85 75 
23. 24 40 23 26 40 31 17 89 85 
24. 53 36 42 55 37 43 74 96 17 
25. 42 28 45 43 33 45 65 88 61 
26. 32 43 23 32 45 31 65 66 82 
27. 17 21 19 24 25 34 74 100 81 
28. 13 34 41 18 40 41 61 718 84 
30. 22 39 16 a4 41 28 81 90 79 
31. 35 32 24 36 36 26 90 91 85 
32. 37 28 49 38 28 50 57 45 712 
33. 30 36 37 31 37 39 75 79 70 
34, 31 22 23 34 33 26 81 78 75 


20 JOURNAL OF EXPERIMENTAL EDUCATION 


and their raters were classified into two groups. 
The first group contained those differences in 
which the teacher thought the factor more im - 
portant than did her rater; the second group con- 
tained those differences in which the teacher 
thought the factor less important than did her 
rater. The frequencies for both groups of disa~ 
greements together with the total number of 
disagreements for A, B, and C teachers are 
shown in Table XII. 

Summary of Frequencies of Disagreements 
The hypothesis studied in Part II of this discus- 
sion was that C teachers disagree with their rat- 
ers more frequently than do either A or B teach- 
ers, and that A teachers disagree with their 
raters less frequently than do either BorC 
teachers. Thirteen classifications of types of 
disagreements were recognized and studied. 
Tables IX through XII were analyzed to deter- 
mine the number of schools in which (1) Ateach- 
ers disagree with their raters less frequent- 
ly than do either B or C teachers, (2) C teach- 
ers disagree with their raters more frequently 
than do either A or B teachers, and (3) A teach- 
ers disagree with their raters less {re quently 
than do C teachers. These numbers of schools 
are shown in Table XII. 

Conclusions. ~The data do not completely 
Support the hypothesis stated earlier in Part 
Two of this analysis. 

Depending upon the instrument and the clas- 
sification of type of disagreements analyzed, the 
number of schools in which A teachers disagree 
less frequently than do either B or C teachers 
varies from 9 to 16. For 12 of the 13 classifi- 
cations of types of disagreements the number of 
schools in which A teachers disagree less fre- 
quently was slightly less than 50 percent of the 
30 schools studied. It appears thatthe num- 
ber of times A teachers disagree with their rat- 
ers more frequently than do either B orC teach- 
ers is slightly more than the number of times 
they disagree less frequently than either B or C 
teachers. 

Depending upon the classification of types of 
disagreements studied, the number of schools 
in which C teachers disagree more frequently 
than do either A or B teachers varies from 8 to 
14. This range would suggest that the number 
of schools is always slightly less than 50 per- 
cent of the schools studied, It appears, there- 
fore, that the number of times C teachers 
disagree with their raters more frequently than 
do A or B teachers is slightly less than the num- 
ber of times they disagree less frequently than 
do A or B teachers. 

It is therefore concluded that the frequency 
of disagreement slightly increases as ratings in- 
crease from poor to average and from average 
to good, 

Depending upon the classification of types of 


(Vol. 24 


disagreements, the number of schools in which 
A teachers disagree with their raters less fre- 
quent!y than do C teachers varies from 13 to 19 
(average 16). Such an average is slightly more 
than the 50 percent of the 30 schools studied. It 
is therefore concluded that the frequency of dis- 
agreement decreases slightly as ratings in- 
crease from poor to good, Such a conclusion 
does not entirely contradict the conc lusion 
stated above which compared frequency of disa- 
greement to ratings increasing from poor to 
average and from average to good. Itis suggest- 
ed that teachers rated average may not necessar- 
ily be between good and poor, a finding suggest- 
ed by Lamke (19). 

The interpretations of the findings of Part 
One, Professional Distance, seem equally appli- 
cable here. It was suggested that required behav- 
iors for performing the role of the good tea c her 
may existas ‘‘core’’ behaviors, over which there 
may be no controversy and little or no disagree- 

,ment, and ‘‘peripheral’’ behaviors, over which 

controversy and disagreement are acceptable. 

If this is so, then frequencies of disagreements 

over both core and peripheral required behav- 
iors may have less meaning for comparisons 

with teacher ratings. 

Secondly, the suggestion, made in the conclu- 
sions to Part One, that the manner of disagree- 
ment may be a potent factor along with the num- 
ber of disagreements seems to apply to the find- 
ings on frequencies of disagreements. These are 
only two suggestions that may clarify the find- 
ings on frequency of disagreements. 


Part Three; Item Analysis 


The differences between the points of view of 
the teachers and their raters were analyzed by 
items. It was hypothesized that if A teachers 
disagreed less frequently and less divergently 
from their raters than did either B or C teach- 
ers on certain items, those items may be con- 
sidered critical in that agreement with one’s rat- 
er may be associated with a teacher being con- 
sidered good by her rater. For convenience, 
such items were called ‘‘A teacher items’. Sim- 
ilarly, it was hypothesized that if C teachers 
disagreed more frequently and more divergently 
from their raters than did either A or B teach- 
ers on certain items, those items may be con- 
sidered critical in that disagreement with one’s 
rater may be associated with a teacher being 
considered poor by her rater. Such items were 
called ‘‘C teacher items’’. 

Teacher Practices. —This instrument co n- 
sisted of 53 teacher practices which subjects 
classified as ‘‘good’’, ‘‘poor’’, or ‘‘makes no 
difference’’. The responses of the teachers 
were compared with those of their raters. Two 
types of disagreements between their responses 


September, 1955) 


TABLE 
FREQUENCIES OF TYPES OF DIFFERENCES FOR TEACHER FACTORS 


Frequencies of Differences 


Teachers thinking Teachers thinking Teachers thinking 
item more impor- item less impor- different from 
tant tant 


QO 
> 


1. 
2. 
3. 
4. 
5. 
6. 
7. 
8. 
9. 


& 

NO SUID 


— 
— 


OO VL 


— 


Ne 


coon 


GROTKE 21 
| 
School 
117 1 
8 18 0 
4 5 10 9 > 
18 0 2 18 18 15 
5 2 
16.22 nw 
17 7 10 
11. 142 0 
12. 3 4 11 6120 18 
13. i213 
14, 13 9 
15. 3 5 
16. 
21. 13 «1615 
22. 
24, 
25. 
26. 
27. 14 
28. 
30. 
31. 14 «15 
32. 20 #19 
33. 
34. 


JOURNAL OF EXPERIMENTAL EDUCATION 


TABLE 
SUMMARY OF THE FREQUENCY TABLES 


Number of Schools of the 30 Studied in which 


A disagrees less C disagrees more A disagrees less 
Instruments and Types frequently than frequently than frequently 
of Differences Borc AorB than C 


Teacher Practices 


Oppositional 
Differences 


Non-oppositional 
Differences 


Total Differences 
Inventory of Beliefs 
Type 4 (Strong 

Opposition) 


Type 3 (Moderate 
Opposition) 


Type 2 (Mild 
Opposition) 


Type 1 (Non- 
Opposition) 


Sub-Totals 
Types 4 and 3 


Types 4,3,2 
(All Opposition) 


Types 4 through 1 
(All Differences) 


Teacher Factors 
First Group 


Second Group 
Total Differences 


22 (Vol. 24 
10 10 17 
13 10 18 
16 ll 18 
13 
10 1s 
13 4 11 
13 14 19 
13 10 18 
12 10 1s 
9 10 17 | 


September, 1955) 


were used in the item analysis. The first type 
were called oppositional disagreements and 
were defined as those in which one subject clas- 
sified the practice as ‘‘good’’, while the other 
classified it as ‘‘poor’’. The second type of dis- 
agreements were called total disagreements and 
were defined as those in which the two profes- 
sional workers responded differently from one 
another. For each item the number of opposi- 
tional disagreements and total disagreements 
between the A teachers and their raters were 
found. Similarly, for each item the number of 
oppositional disagreements and total disagree- 
ments for B and C teachers were found. These 
data are shown in Table XIV. 

In analyzing Table XIV A teacher items were 
defined as those on which (1) the number of dis- 
agreeing A teachers is zero, while the number 
of disagreeing B or C teachers is one or more, 
or (2) the number of disagreeing A teachers is 
50 percent or less of the number of disagreeing 
B teachers or C teachers, whichever is less. 
Items numbered 2, 3, 23, 24a, 24c, and4l are 
A teacher items. 

C teacher items were defined as those on 
which (1) the number of disagreeing C teachers 
is one or more, while the number of disagree- 
ing A or B teachers is zero, or (2) the number 
of disagreeing C teachers is 200 percent or 
more of the number of disagreeing Aor B teach- 
ers, whichever is larger. Items numbered Z, 
4, 5, 6d, 19, and 22 are C teacher items. 

Conclusions to Item Analysis of Teacher 
Practices. —The data supports the hypothesis 
that there are certain items on which A teach- 
ers disagree with their raters less frequently 
than do either B or C teachers. Six such items 
were found. They are: 


Stands at the side of the room. 
Stands at the rear of the room. 
Organizes subject matter into psycho- 
logically arranged form (from pupils’ 
experiences to logical generalizations). 
. Assignments: page to page in textbook. 
. Assignments: general topics and nothing 
more, 
Measures results of learning by changed 
pupils’ attitudes and behaviors. 


Further, the data supports the hypothesis that 
on certain items C teachers disagree with their 
raters more frequently than do either Aor B 


teachers. Six such items were found. Theyare: 


Stands at the side of the room. 
Sits at desk. 
. Sits on pupil’s desk at front of room. 
. Sits in pupil’s seat at rear of room. 
Provides for individual differences by 
differentiating assignments (the contract 


GROTKE 


plan, unit instruction, level assignments, 
etc. ). 

22. Organizes subject matter in problem-pro- 
ject form. 


One item, number 2, appears to be both an 
A and C teacher item. Apparently, agreement 
with one’s rater on the evaluation of the prac - 
tice of standing at the side of the room may be 
associated with a teacher being considered a 
good teacher, while disagreement with one's rat- 
er on the evaluation of this practice may be as- 
sociated with a teacher being considered a poor 
teacher. 

Individual items and their implications are 
discussed in more detail in the summary and con- 
clusions to Part Three. 

The Inventory of Beliefs. —This instrument 
consisted of 120 statements of beliefs related to 
education, Subjects were asked to select one of 
the following responses as their answer: (1) Yes, 
I definitely believe this statement; (2) 1 am in- 
clined to believe this statement; (3) Icannot say; 
(4) Lam inclined not to believe this statement; 
or (5) No, I definitely do not believe this state- 
ment. Responses of the teachers were com- 
pared with those of their raters, and weights 
were assigned to the differences as in Part 
One of this analysis. 

Two approaches were followed in analyzing 
this instrument by items. The first approach 
considered frequencies of disagreements without 
regard to the degree of divergencies. The sec- 
ond approach considered both frequencies and 
divergencies. Four analyses were made of the 
data prepared for each of the two approaches. 
Critical items are quoted in the summary and 
conclusions of the item analysis of this instru- 
ment, 

Analysis of Frequencies of Disagreements. — 
Two classifications of disagreements were used 
in the analysis. The first classification, called 
oppositional disagreements, was definedas those 
assigned weights of 4 or 3 plus those assigned 
a weight of two when one subject responded, ‘‘I 
am inclined to believe this statement, '’ and the 
other responded, ‘‘I am inclined not to believe 
this statement.’’ The second ciassification, 
called total disagreement, was definedas those 
on which the teacher responded dif- 
ferently from her rater. For eachitem the num- 
ber of A teachers who opposed their raters and 
who responded differently from them were found. 
Similarly, the number of B and C teachers who 
opposed and responded differently {rom their 
raters were found for each item, These [re- 
quencies of A, B, and C teachers are shown in 
Table XV. 

In analyzing the oppositional frequencies, A 
teacher items were defined as any item onwhich 
the number of A teachers who oppose their rat- 


23 

3. 
23. 
24a 
24c 
41. 

2 

4 

5 

6 
19 


JOURNAL OF EXPERIMENTAL EDUCATION (Vol. 24 


TABLE XIV 
ITEM ANALYSIS OF TEACHING PRACTICES 


Oppositional Non-oppositional 
A B 


14 
16 
17 
15 
17 
16 
15 
12 
14 
16 
13 
10 


— 


Crocs 84420, 02420 


A 
3 
0 
i 
0 
0 
0 
2 
3 
2 
2 
0 
2 
3 
3 
4 
6 
5 
7 
9 
8 
9 
5 
4 
2 
7 
0 
5 
3 
1 
0 
2 
6 
1 
9 
0 
6 
3 
3 
3 
0 
0 
4 
7 
7 
5 
6 
3 
4 
3 
8 
6 
2 
6 


*Three numbers (6, 16, and 24) of the sequence use letter suffixes, follow- 
ing the pattern of the source of the items, See beginning of Section III for a 
description of the instrument and the procedure followed in its construc- 
tion, 

Note: This table should be read as follows: Three A teachers opposed 

their raters on item one; two B teachers opposed their raters on item one; 

... 14 A teachers differed but did not oppose their raters on item one; ... 

17 A teachers differed from their raters for item one. 


24 
Item Total 

Number* A B Cc 

13 17 18 15 

11 16 16 15 

14 18 17 16 

20 15 23 22 

13 17 14 14 

21 16 16 21 

23 17 21 24 

19 15 18 21 

17 16 16 21 

14 18 18 18 

14 13 12 14 

10 12 5 il 

1 16 20 24 19 
il. 15 19 16 17 
12. 18 22 20 22 
13. 15 18 18 18 

14 16 16 21 24 
15. 13 18 15 21 
16a. 9 15 15 17 
16b. 9 5 7 
16c. 12 14 17 
16d. il 10 10 
l6e. 11 10 12 
17. 3 2 3 
18. 
19. 9 1 1 3 
20. 10 12 8 
21. 24 20 21 
22. 9 7 12 
23. 6 6 10 
24a 4 8 9 
24b. 9 15 13 
24c. 5 8 6 
24d. 13 16 ll 
24e. 3 4 5 
25. 15 18 18 
26. 7 12 9 
27. 15 20 17 
28. 8 11 12 
29. 2 3 4 
30. 6 6 6 
31. il 12 ll 
32. 12 ll 17 20 18 
33. 10 ll 17 17 18 
34. ll ll 13 15 16 
35. 12 14 13 18 18 
36. 6 6 9 8 9 
‘ 37. 3 3 6 8 6 
38. 3 5 6 9 9 
39. 10 5 11 23 15 
40. 12 8 10 15 14 
41. 5 5 3 8 6 
i 42. 8 8 12 16 il 


September, 1955) GROTKE 


TABLE XV 


ITEM ANALYSIS OF BELIEFS RELATED TO EDUCATION: FRE- 
QUENCIES OF OPPOSITIONAL AND TOTAL DIFFERENCES 


Oppositional Total 
Differences Differences 
A B 


17 19 
21 
19 
22 
17 
15 
11 
21 
19 
13 


18 
20 
17 
15 
25 
28 
20 
12 
18 
22 


11. 
12. 
13. 
14. 
15. 
16. 
17. 
18. 
19. 
20. 


19 
25 
20 
22 
17 
21 
14 
16 
ly 
23 


21 
23 
24 
20 
24 
25 
15 
16 
22 


25 
18 
23 
12 
24 
21 
12 
22 
22 
i8 


25 

Number A Cc 
12 8 9 19 
9 12 11 23 
10 11 14 17 
11 13 10 20 
19 6 9 20 
7 7 5 14 
1 6 4 10 
9 12 li 17 
10 12 15 
0 0 0 9 
5 7 8 15 15 
10 8 & 20 16 
4 4 § 15 13 
4 4 5 
13 13 il 22 23 
13 14 15 25 24 
12 14 16 21 18 
1 4 2 10 10 
4 4 8 20 19 
15 10 14 24 22 
10 a 22 21 
10 10 11 23 22 
12 11 19 21 27 
6 4 17 15 
9 12 17 22 23 
7 15 13 
3 3 3 8 i] 
29. # 7 6 17 15 
30. 12 10 1l 18 21 
31. 14 10 11 22 19 
32. 12 ll ll 23 21 
33. 8 10 13 22 23 
34. 18 18 
35. 14 17 
36. 25 22 
37. 25 25 
38. 14 20 
39. 14 16 
40. 17 18 
41. 21 23 
42. 20 18 
43. 17 25 
44. 12 15 
45. 20 23 
46. 17 23 
47. 13 19 

48. 23 20 . 
49. 19 19 
50. 16 19 


JOURNAL OF EXPERIMENTAL EDUCATION (Vol. 24 


TABLE XV (Continued) 


Oppositional 
Differences 


9 
7 
6 
1 
7 
5 
5 
4 
2 


& +b 
ono ea NK 


sc 


26 
Total 

Item Differences 
Numer A B Cc A B Cc 
} 51. 8 11 y 22 25 23 
§2. 12 9 15 20 23 21 
53. 3 6 1 15 25 18 
54. 12 6 6 19 17 15 
55. i) 9 9 21 23 24 
56. 10 14 i) 20 27 20 
57. 10 7 13 20 20 23 
58. 10 13 10 18 23 25 
59. 11 y 14 25 26 25 
66. 6 7 9 22 18 19 
61. 9 7 10 21 23 22 
62. 15 8 15 27 22 25 
63. 8 10 8 20 24 23 
64. 13 11 15 23 23 26 
65. 9 12 12 25 24 22 
66. 15 14 13 26 26 27 
67. 5 5 5 17 17 18 
68. 5 17 13 17 25 22 
69. 12 13 15 24 25 24 
70. 11 9 13 21 23 23 
5 14 14 14 22 24 24 
72. 14 11 19 16 16 
73. 9 7 25 21 26 
74. 6 7 20 20 20 
75. 6 10 | 18 24 24 
76. 8 18 19 17 
77. 5 14 20 16 
78. 5 15 21 19 
79. 13 21 21 22 
80. 11 24 25 21 
61. 3 8 13 12 
82. 10 23 24 25 
83. 2 12 12 11 
84. 1 16 20 12 
85. 4 19 22 21 
86. 12 24 23 19 
87. 5 15 19 22 
68. 11 20 15 14 
89. 12 24 22 19 
15 20 23 23 

1} 20 19 24 

10 25 18 22 

8 22 24 23 

11 26 22 23 

7 21 18 15 

5 17 16 18 

15 17 21 23 

1 ll ll 13 

7 21 23 24 

13 22 22 24 

14 21 23 27 ; 
ll 23 21 23 


September, 1955) 


TABLE XV (Continued) 


Oppositional Total 
Differences Differences 
B B 


13 24 
15 22 
8 23 
14 23 
16 
20 
11 
24 


> 


13 
13 
10 
13 

6 
16 

3 
12 


25 
19 
16 
13 

9 
20 
26 
13 
22 
12 


GROTKE 27 

Number Cc 
103. 22 

104, 22 
105. 23 
106. 22 
107. 13 
108. 20 
109. 10 
110. 24 
111. 15 26 26 
112. 11 15 19 
113. 5 14 14 
114. 3 12 12 
115. 3 12 15 
116. 9 21 21 
117. 7 20 20 
118. 3 8 12 
119. 11 24 26 
120. 3 1l 13 


28 JOURNAL OF EXPERIMENTAL EDUCATION 


ers was 67 or less percent of the number of B 

or number of C teachers, whichever is less, 

who oppose their rater. Seven items, those 

numbered 7, 18, 40, 44, 68, 75, and 84, fitthe 
definition. 

In analyzing the total differences frequencies, 
A teacher items were defined as any item on 
which the number of A teachers who disagree 
with their raters was 67 percent or less of the 
number of B or number of C teachers, which- 
ever is less, who disagree with their raters, It- 
tems numbered 61, 105, 108, and 118 fit the 
definition. 

In analyzing the oppositional frequencies for 
C teacher items, they were defined as any it - 
em on which the number of C teachers who op- 
pose their rater is 150 or more percent of the 
number of A or number of B teachers, which- 
ever is more, who oppose their raters. Four 
items, numbered 19, 24, 91, and 102, fit the 
definition. 

In analyzing the total differences frequencies 
for C teacher items, they are defined as those 
on which the number of C teachers who disagree 
with their raters is 150 or more percent of the 
number of A or number of B teachers, whic h- 
ever is more, who disagree with their rater. 
There are no such items, 

Analysis of divergencies of disagreements.— 
Two analyses of the weights assigned to the dis- 
agreements were made, The first considered 
the weights assigned to oppositional disagree- 
ments, The second considered all assigned 
weights. 

In the first analysis an oppositional weight 
for each item was computed for the A teachers 
by summing the weights assigned to the opposi- 
tional disagreements between these teachers 
and their raters. For the second analysis total 
weights for each item were computed for the A 
teachers by summing the weights assigned toall 
disagreements between these teachers and their 
raters. Similarly, oppositional weights and tot- 
al weights were computed for each item for the 
Band C teachers, These dataare shown in 
Table XVI. 

In analyzing the oppositional weights, A 
teacher items were definedas those onwhich 
the oppositional weight of the A teachers was 67 
or less percent of the oppositional weight of the 
B or C teachers, whichever is less, Nine items, 
numbered 7, 18, 40, 44, 68, 75, 84, 103, and 
108, fit the definition. Only one item, number 
103, was not detected using the analysis for 
frequencies, 

In analyzing the total weights for A teacher 
items, they were defined as any item on which 
the total weight of the A teachers was 67 or less 
percent of the total weight of the B or C teach- 
ers, whichever is less, Items number 7 and 
68, both found previously, were found to be 


(Vol. 24 


critical. 

In analyzing the oppositional weights for C 
teacher items, they were defined as those on 
which the oppositional weight for the C teacher 
was 150 or more percent of the oppositional 
weight of the A or B teacher, whichever is more. 
Items numbered 19, 24, 91, 112, and 120 were 
found to be critical. Items 112 and 120 had not 
been detected using other analyses. 

In analyzing the total weights for C teacher 
items, they were defined similarly to the pat- 
tern above. No new items were found. 

Conclusions to the Item Analysis of the Inven- 
tory of Beliefs. —The data seems to support the 
hypothesis that there are certain items on which 
A teachers disagree with their raters less fre- 
quently than do either B or C teachers. Twelve 
such items were found. They are: 


7. I believe that pupils should be permitted 
to call teachers by their nicknames or 
given names. 

18. I believe that teaching offers a wide var- 
iety of interesting experiences. 

40. I believe that today’s schooling makes 
too many students consider unskilledand 
semiskilled positions as not good enough 
for them. 

I believe that teachers should teach 
students to side with the majority oncon- 
troversial issues. 

. I believe that in hiring an individual for 
a job, it is often advisable to include 
race, color, and religion, in making your 
selection. 

. I believe that it is reasonable to fire a 
teacher who admits he is a Socialist. 

I believe that private profits are essen- 
tial to any successful economic system. 
I believe that free enterprise has proved 
its superiority over other types of econ- 
omic enterprises for America. 

I believe that churches cause needless 
strife by over-emphasizing the differ- 
ences among groups. 

. I believe that churches should take bet- 
ter care of their own parishioners rather 
than spend money on missions in foreign 
countries. 

. I believe that public schools should pro- 
vide released time from classes for re- 
ligious instruction. 

. I believe that working with people is bet- 
ter than working with things. 


Further, the data seem to support the hy- 
pothesis that on certain items, C teachers dis- 
agree with their raters more frequently than do 
either A or B teachers. Six such items were 
found. They are: 


September, 1955) GROTKE 


TABLE XVI 


ITEM ANALYSIS OF BELIEFS RELATED TO EDUCATION: 
WEIGHTED SCORES OF OPPOSITIONAL AND TOTAL 
DIFF ERENCES 


Oppositional Total 
Weights Weights 
B Cc 


27 12 
39 38 
34 45 
41 30 
21 32 
26 17 
20 15 
38 37 
37 29 

0 0 


26 24 
29 26 
il 18 
12 17 
43 36 
43 46 
44 49 
13 7 
13 24 
30 31 


30 47 
32 31 
38 

61 

15 

53 

12 

7 

20 

31 


37 
34 
44 
27 
25 
43 
28 

0 
17 
27 


19 

3 
49 
32 
31 
24 
28 
28 
il 
30 


29 
Number A c 
1. 38 44 41 42 
2. 26 39 48 50 
3. 37 43 44 48 
4. 41 52 50 41 
5. 36 47 32 44 
6. 24 42 38 31 
7. 4 12 25 21 
8. 29 35 47 43 
9. 36 50 44 37 
10. 0 9 13 9 
11. 18 29 38 32 
12. 35 50 47 38 
13. 14 28 28 26 
14, 13 18 24 22 
15. 44 54 58 49 
16. 38 53 61 59 
17. 40 50 47 51 
18. 4 14 21 15 
19. 13 35 31 39 
20. 31 46 44 49 
21. 53 65 42 58 
22. 28 46 49 45 
23. 35 49 41 51 
24. 35 47 49 73 
25. 26 37 28 28 
26. 30 46 51 62 
27. 27 33 31 22 
28. 11 17 25 13 
29. 23 32 36 30 
30. 32 39 49 42 
31. 48 32 57 48 47 
32. 36 35 48 47 46 
33. 26 32 45 49 58 
34, 42 30 49 42 39 
35. 27 8 34 17 36 
36. 55 40 66 52 53 
37. 45 37 57 54 48 
38. 6 13 20 28 26 
39. 16 17 27 31 29 
40. 18 33 36 49 41 
41. 46 24 56 50 43 
42. 6 12 25 28 24 
43. 27 38 37 50 62 
44. 13 20 21 28 40 
45. 23 27 41 48 47 
46. 24 17 35 37 41 
47. 25 12 33 31 42 
48 23 16 46 40 46 
49. 8 12 31 37 33 
50. 22 25 33 39 47 


JOURNAL OF EXPERIMENTAL EDUCATION 


TABLE XVI (Continued) 


Oppositional 
Weights 
B Cc 


36 30 
28 46 
22 2 
18 21 
28 27 
46 31 
21 43 
40 33 
27 45 
19 28 


23 35 
28 53 
30 28 
36 50 
38 37 
43 36 
19 17 
61 39 
37 43 
25 38 


45 
35 
26 
16 
34 
20 
15 
16 
48 
37 


11 
22 

4 

6 
13 
29 

7 
25 
36 
51 


36 
30 
26 
31 
22 
18 
44 

3 
22 
38 


30 ee (Vol. 24 
Number A A B c 
51. 24 41 §2 48 
52. 34 43 46 54 
53. i) 26 44 26 
54. 34 42 34 33 
55. 25 41 45 44 
56. 33 47 61 46 
57. 32 46 39 56 
58. 31 41 56 54 
59. 35 53 47 60 
60. 20 40 35 41 
61. 31 47 43 54 
62. 48 66 48 68 
63. 27 45 50 47 
64. 43 54 52 64 
65. 28 49 53 49 
66. 48 62 57 53 
67. 18 36 33 35 
68. 17 30 70 49 
69. 37 32 50 53 
70, 33 44 41 49 
71, 46 44 54 54 
72. 53 42 60 49 44 
_ 73. 26 24 48 44 57 
74. 16 22 38 40 35 
75. 21 34 36 52 55 
76. 30 29 43 47 35 
77. 16 16 31 35 30 
78. 17 18 30 38 37 
79. 43 20 52 39 59 
80. 35 38 55 55 49 
81. 11 6 17 20 22 
82. 30 52 44 64 44 
83. 6 16 
84, 4 16 25 38 22 
85. 12 29 32 49 35 
86. 42 37 60 52 43 
87. 17 22 30 39 35 
88. 38 28 49 35 32 
89. 40 45 53 53 45 
90. 50 29 58 49 61 
91. 16 13 33 32 52 
92. 25 21 44 37 45 
93. 19 «34 
95. 30 20 46 37 31 
96. 27 21 40 37 34 
97. 33 44 42 52 53 
98. 3 7 16 619 
99. 17 20 36 42 42 
100, 23 27 40 45 54 


TABLE XVI (Continued) 


Oppositional 
Weights 
B Cc 


45 46 
38 36 
45 53 
46 57 
23 32 
50 48 
22 17 
39 36 
7 

53 


47 
39 


September, 1955) GROTKE 31 
Total 
Number A A B Cc 
101. 34 45 55 59 
102. 36 49 49 48 
103. 47 61 60 62 
104. 43 56 54 62 
105. 33 47 42 48 
106. 48 62 64 58 
107. 21 30 35 28 
108. 51 61 49 45 
109. 10 20 14 17 
110. 40 55 59 63 
ill. 32 25 | 49 46 62 
112. 15 20 26 34 47 
113. 8 16 16 21 30 26 
114, 14 7 11 26 22 24 
115 7 9 10 19 17 25 
116. 24 17 25 41 36 40 
117. 33 39 22 45 59 41 
118. 6 19 9 13 27 18 
119 38 43 37 55 55 57 
120. 3 0 10 14 14 22 


JOURNAL OF EXPERIMENTAL EDUCATION 


19. I believe that in teaching, promotions 
are jased on who you know rather than 
on what you know, 

24. I believe that teachers should be free 
to use alcoholic beverages. 

91. I believe that trade unions have done 
more harm than good in our industrial 
progress. 

102. I believe that people who claim to be 
religious are less tolerant than people 
who do not claim to be religious. 

112. I believe that most people will take ad- 
vantage of you. 

120. I believe that a large amount of money 
is a prerequisite to success. 


A more detailed interpretation of the items 
and their implications is presented in the sum- 
mary and conclusions to Part Three. 

Teacher Factors, --This instrument consist- 
ed of 25 teacher factors which subjects classi- 
fied on a five-point scale from ‘‘utmost import- 
ance’’ to ‘‘insignificant’’. The responses of the 
teachers were compared with those of their 
raters. When differences existed between the 
responses of the teacher and those of her rat- 
er, weights were assigned as in Part One of 
this section. Plus and minus designators were 
added similarly to the procedure used in the 
Compensated Score analysis. 

Two approaches were followed in analyzing 
the positive and negative weights. The first 
approach considered frequency of weights with- 
out regard to divergencies. The second ap- 
proach considered both frequencies and diver- 
gencies. Four specific procedures were fol- 
lowed to analyze the data prepared for each of 
the two approaches. Critical items are quoted 
in the summary and conclusions of the item an- 
alysis for Teacher Factors. 

Procedures for Analyzing the Frequencies 
of Disagreements. —Following the first ap- 
proach, frequencies of positive weights and neg- 
ative weights were found for the A teachers for 
each item, Similarly, frequencies of positive 
and negative weights were found for the B and 
C teachers for each item. These dataare shown 
in Table XVII. 

In analyzing the frequencies of positive 
weights in Table XVII for A teacher items, it 
was hypothesized that A teachers would place 
more importance on the teacher factors than 
would B or C teachers. Therefore, for the first 
specific procedure, A teacher items were de - 
fined as items on which the number of A teach- 
ers who classified the factor as more import- 
ant than did their raters is 200 or more per- 
cent of the number of B or number of C teach- 
ers, whichever is higher, who classified it as 
more important than did their raters. There 
are no such items. 


(Vol. 24 


For specific procedure number two, A teach- 
er items were defined as items on which the 
number of A teachers who classified the factor 
as less important than did their raters is 50 or 
less percent of the number of B or number of C 
teachers, whichever is less, who classified it 
as less important than did their raters. Items 
9, 12, and 16 fit the definition. 

In analyzing Table XVII for C teacher items, 
it was hypothesized that C teachers would place 
less importance on the teacher factors than 
would A or B teachers. Therefore, for specific 
procedure number three, C teacher items were 
defined as items on which the number of C teach- 
ers who considered the factor more important 
than did their raters is 50 or less percent of the 
number of A teachers or number of B teachers, 
whichever is less, who consider the factor as 
more important than did their raters. There 
are no such items. 

For specific procedure number four, C teach- 
er items were defined as items on which the num- 
ber of C teachers who classified the factor as 
less important than did their raters is 200 or 
more percent of the number of A or number of 
B teachers, whichever is more, who: classified 
it as less important than did their raters. There 
are no such items. 

Procedures for Analyzing Divergencies and 
Frequencies. —Following the second approach, 
a positive weighted score for A teachers was 
computed for each item by summing the positive 
weights assigned tothe differences between 
the responses of the A teachers and their raters. 
Likewise, a negative weighted score for A teach- 
ers was computed for each item by summing the 
negative weights assigned to the differences be- 
tween them. Similarly, sets of positive and neg- 
ative weighted scores for each item were com- 
puted for the B and C teachers. These data 
are shown in Table XVIII. 

Continuing the study of the same hypothe sis 
proposed for the analysis of the frequencies of 
disagreements, for specific procedure number 
five, A teacher items were defined as items on 
which the positive weighted score of the A teach- 
ers is 200 or more percent of the positive weight- 
ed score of the B or C teachers, whichever is 
higher. No such item fits the definition. How- 
ever, one item, number 21, seems to reverse 
the hypothesis to a marked degree in that posi- 
tive weighted score for A teachers was 50 per- 
cent of the B teachers’ score and 55 percent of 
the C teachers’ score. 

For specific procedure number six, A teach- 
er items were defined as items on which the neg- 
ative weighted score of the A teachers is 55 or 
less percent of the negative weighted score of 
the B or C teachers, whichever is less. Items 
numbered 9, 12, 24, and 25 fit the definition. 

Continuing the C teacher aspect of the hy- 


September, 1955) 


TABLE XVII 


ITEM ANALYSIS OF TEACHER FACTORS: FREQUENCIES OF ASSIGNED 
POSITIVE AND NEGATIVE WEIGHTS 


A Teachers B Teachers C Teachers 
Pos. Neg. > Neg. Pos. Neg. 


— 
ceo 


2 
3 
8 
9 
8 
7 
8 
4 
1 
9 
1 
4 
7 
0 
5 
5 
5 
0 
8 


GROTKE 33 
Item 
5 
2. 18 
; 3. 14 
4. 13 
5. 9 
6. 12 
13 
8. 11 
9. 7 
10. 14 
11. 13 
12. 17 
13. 10 
14. 8 
15. 5 
16. 11 
17. 12 
18. 14 
19. 11 | 
20. 14 11 6 
21. 6 8 12 
22. 7 8 10 
23. 14 15 5 
24. 16 12 9 
25. 10 11 ll 


JOURNAL OF EXPERIMENTAL EDUCATION 


TABLE XVIII 
TEACHER FACTORS: ITEM ANALYSIS OF WEIGHTED SCORES 


A Teachers B Teachers C Teachers 
Item Pos. Neg. Pos. Neg. Pos. Neg. 
Number Score Score Score Score Score Score 


8 7 
27 23 
17 18 
17 20 
il 9 
17 12 
17 19 


5 
25 
17 
18 
13 
14 
18 


11 14 

6 6 
19 14 
18 15 
26 25 
18 17 
18 15 


5 3 
13 15 
12 ll 
15 15 
15 13 


15 

8 
14 
14 
26 
13 
13 


6 
13 
13 
17 
14 


17 16 
14 13 
20 ll 
19 19 
19 17 
20 16 


16 
7 
9 

16 

22 

14 


34 (Vol. 24 

1. 7 
7 
3. 9 
4. ll 
5. 15 

6. 10 
7. 4 
8. 9 
9. 8 
10. 10 
11. 5 
12. 3 
13. 10 
14, 11 
15. 7 
16. 4 
17. 1 
18. 10 
19. 13 
20. 8 
21. 15 
22. 12 
23. 6 
24. 10 
25. 16 


September, 1955) 


pothesis stated above, for specific procedure 
number seven, C teacher items were definedas 
items on which the positive weighted score of 
the C teachers was 50 or more percent of the 
positive weighted score of the A or B teachers, 
whichever is lower. There are no such items. 

For specific procedure number eight, C 
teacher items were defined as items on which 
the negative weighted score of the C teachers is 
200 or more percent of the negative weighted 
score of the A or B teachers, whichever is high- 
er. There are no such items. 

Summary of the Eight Specific Procedures.— 
Differences between the points of view of the 
teachers and their raters were assigned posi- 
tive weights when the teacher classified the fac- 
tor as more important than did her rater, and 
negative weights when the teacher classified it 
as less important than did her rater. For each 
item of the Teacher Factor instrument frequen- 
cies of positive and negative weights were ob- 
tained for the A, B, and C teachers. Secondly, 
for each item positive weighted scores and neg- 
ative weighted scores were obtained for the A, 
B, and C teachers. In greatly abbreviated form 
the eight specific procedures may be summar- 
ized with percentages and type of data indicated: 


1. A is 200% of B or C using plus frequen- 
cies. 

2. A is 50% of ~ or C using minus frequen- 
cies. 

3. C is 50% of A or B using plus frequencies. 

4. C is 200% of A or B using minus frequen- 
cies. 

5. A is 200% of B or C using positive weight- 
ed scores. 

6. A is 55% of B or C using negative weight- 
ed scores. 

7. C is 60% of A or B using positive weight- 
ed scores. 

8. C is 200% of A or B using negative weight- 
ed scores. 


Conclusions to the Item Analysis of Teacher 
Factors. —The data seem to support some as- 
pects of the hypothesis, stated earlier, and not 
to support other aspects of it. There are cer- 
tain items on which the number of A teachers, 
who classify teacher factors as less important 
than did their raters, is decidedly less than the 
number of B or C teachers, who do so. Three 
such items were found. They are: 


9. Provides for individual differences. 
12. Has mastery of subject matter. 
16. Obviously fair with pupils of all minor- 


ity groups. 


There are four items on which the degree of 
divergencies of the responses of the A teachers, 


GROTKE 


who classified the Teacher Factors as less im~- 
portant than did their raters, is decidedly less 
than the degree of divergencies of the responses 
of the B and C teachers, who did so. The four 
items are 9 and 12, stated above, and: 


24. Skillful in teacher-parent relationships. 
25. Assists in care and improvement of the 
school equipment, buildings, and grounds. 


There was one item on which the hypothesis 
seemed to be reversed. The degree of diver- 
gency of the A teachers, who classified the fac- 
tor as more important than did their raters, was 
decidedly less than the degree of divergencies of 
the B or C teachers, who did so. This item is: 


21. Offers thoughtful comments and criti- 
cisms for improvement of the school. 


The data do not support the hypothesis in re- 
gard to C teacher items. Onno items of the 
Teacher Factor instrument is the variation be- 
tween the C teachers and their raters decidedly 
different from the variations between the A 
teachers and B teachers and their raters. 

Individual items and their implications are 
discussed in more detail in the summary and 
conclusions to Part Three. 


Summary and Conclusions to Part Three; It- 
em Analysis. —Item analyses of all three meas- 
ures used to study professional distances were 
made, It was thought that if on certain items 
the number of teachers rated good who disagreed 
with their raters was very low as compared with 
the number of teachers rated average or poor 
who disagreed with their raters, suchitems may 
be considered critical in that agreement on these 
items with one’s rater may be associated witha 
teacher being considered good. Such items were 
called A teacher items. Similarly, itwas thought 
that if on certain items the number of teachers 
rated poor who disagreed with their rater was 
very high as compared with the number of teach- 
ers rated good or average who disagreed with 
their raters, then those items may be consider- 
ed critical in that disagreement on these items 
with one’s rater may be associated witha teach- 
er being considered poor, Such items were 
called C teacher items. 

The disagreements between rater’s and 
teachers’ responses to each item were analyzed 
using several different approaches. For each 
approach definitions of A teacher andC teach- 
er items were established and applied to the da- 
ta. Of the 53 items on the Teacher Practices 
instrument, six were A teacher items, andsix 
were C teacher items. Of the 120 items onthe 
Inventory of Beliefs, twelve were A teacher 
items, and six were C teacher items. Of the 


36 JOURNAL OF EXPERIMENTAL EDUCATION 


25 items on the Teacher Factor instrument, 
five were A teacher items, and none was aC 
teacher item. The number of critical items on 
the three instruments would suggest that of the 
198 items studied, 23 are items onwhich teach- 
ers rated good agree with their raters to a de- 
cided degree more frequently than do teachers 
rated average or poor, and 12 are items on 
which teachers rated poor disagree with their 
raters to a decided degree more frequently than 
do teachers rated good or average. A total of 
35 items were found to be critical. 

The A teacher items, numbers 2 and 3 of the 
Teacher Practices instrument, suggest that 
teachers considered good agree with their rat- 
ers on where or where not to stand while teach- 
ing. In contrast, four of the C teacher items 
found on the Teacher Practices instrument sug- 
gests that teachers rated poor disagree with 
their raters on where or where not to stand 
(item 2) and where or where not to sit (items 4, 
5, and 6d). 

The A teacher item, number 23 on the same 
instrument, suggests that teachers rated good 
tend to agree with their raters on the evaluation 
of the psychological arrangement of subject 
matter, while the C teacher item, number 22, 
suggests that teachers rated poor tend to dis - 
agree on the evaluation of organizing subject 
matter in problem~-project form. Related to 
these practices seems to be the A teacher item, 
number 9 of the Teacher Factors instrument. 
Apparently, teachers rated good place more im- 
portance on the mastery of subject matter than 
do B or C teachers. Mastery of subject matter 
would be necessary for rearrangement of in in- 
to psychological unit form. 

Agreement with one’s rater on the role of the 
teacher concerning assignments seems to bea 
critical area, Item numbers 24a and 24c of 
Teacher Practices suggest that teachers rated 
good agree with their raters on the evaluation 
of page to page textbook assignments and gener- 
al topic assignments with nothing more, while 
item 19 of the same instrument suggests that 
teachers rated poor disagree with their raters 
on the evaluation of providing for individual dif- 
ferences by differentiating assignments. The A 
teacher item number 12 of the Teacher Factor 
instrument seems related to this area. It sug- 
gests that A teachers tend to place more import- 
ance on providing for individual differences than 
do B or C teachers. 

The last A teacher item of the Teacher Prac- 
tices was measuring the results of learning by 
changed pupil attitudes and behaviors. Appar- 
ently, teachers rated good agree with their 
raters on the evaluation of this practice more 
frequently than do B or C teachers. 

Agreement with one’s rater on attitudes 
toward minority groups was found to be a crit- 


(Vol. 24 


icalarea. Items 16 of Teacher Factors and 68 
of the Inventory of Beliefs are in this category. 
Apparently, A teachers agree with their raters 
on the importance of being fair with pupils of all 
minority groups and believing or disbelie ving 
that it is advisable to include race, color, and 
religion in hiring applicants for a position. The 
responses of C teachers did not differentiate 
themselves from B teachers in this area. 

The topic of a teacher as a member of a pro- 
fessional staff offered three critical items. As- 
suming that differences among the variation be- 
tween teachers’ and their raters’ classification 
of Teacher Factors indicate differences among 
A, B, and C teachers in their stress on their 
behaviors, it would seem that A teachers stress 
being skillful in teacher-parent relations hips 
and assisting in the care and improvement of 
school equipment, building, and grounds more 
than do B or C teachers, and they stress offer- 
ing thoughtful comments and criticisms for im- 
provement of the school less than do BorC 
teachers. 

Agreements and disagreements on religion 
was shown to be critical. In this area, three 
items, numbers 103, 105, and 108 of the Inven- 
tory of Beliefs, were A teacher items, and one 
item, number 102 of the instrument, was aC 
teacher item. Apparently, teachers rated good 
tend to agree with their raters more often and 
to greater extent on believing or disbelieving 
that churches cause needless strife by over em- 
phasizing differences among groups, that 
churches should take greater care of their par- 
ishioners rather than foreign missions, and that 
schools should release school time for religious 
instruction. Teachers rated poor tend to disa- 
gree more frequently on believing or disbeliev- 
ing that people who claim to be religious are 
more tolerant than people who do not ciaim to be 
religious. 

A teachers tend to agree with their raters on 
items eulogizing the profession. Items number 
18 and 118 of beliefs were found to be A teacher 
items. Apparently, teachers rated good agree 
with their raters on believing or disbelie ving 
that teaching offers a wide variety of interesting 
experiences and that working with people is bet- 
ter than working with things. On the other hand, 
C teachers tend to disagree with their raters 
on beliefs suggesting less commendable atti - 
tudes toward the profession. Items 19 and 
112 of beliefs were found as C teacher items. 
Apparently, teachers rated poor disagree with 
their raters on believing or disbelieving that in 
teaching, promotions are based on who you 
know rather than on what you know and that 
most people will take advantage of you. 

Several critical items were found in an area 
that may be named Americanism and econom- 
ics. On two items, numbers 75 and 81 on be- 


September, 1955) 


liefs, teachers rated good were found to more 
frequently agree with their raters on believing 
or disbelieving that it is reasonable to fire a 
teacher who admits he is a Socialist and that 
private profits are essential to a successful 
economic system. Concerning labor, item 
number 40 was found as an A teacher item and 
number 91 and 120 of beliefs were found as C 
teacher items. Teachers rated good apparent- 
ly tend to join their raters in believing or dis- 
believing that schools today make too many stu- 
dents consider unskilled and semi-skilled posi- 
tions as not good enough for them, while teach- 
ers rated poor tend to disagree with their rat- 
ers on believing or disbelieving that trade un- 
ions have done more harm than good in our 
industrial progress and that a large a m ount of 
money is a prerequisite to success. AnAteach- 
er item somewhat related to this area was num- 
ber 44. Teachers rated good tend toagree with 
their raters in believing or disbelieving that 
teachers should teach students to side with the 
majority on controversial issues. 

Two items dealt with the personal conduct 
of teachers. The A teacher item, number 7 of 
beliefs, suggests that teachers rated good tend 
to agree with their raters in believing or disbe- 
lieving that pupils should be permitted to call 
teachers by their given names. Item number 
24 of beliefs suggests that teachers rated poor 
tend to disagree with their raters on believing 
or disbelieving that teachers should be free to 
use alcoholic beverages. 

No attempt has been made in this study to 
isolate the teacher behaviors that are good or 
poor. It was the intention of the item analysis 
to detepmine on what items A teachers tend to 
agree with their rater more frequently than do 
B or C teachers, and on what items C teachers 
tend to disagree with their raters more frequent- 
ly than do A or B teachers. SuchAandC teach- 
er items have been found and reviewed. It is 
not implied that the 35 items presented here 
are the only critical items for all raters. It is 
quite likely that certain raters have many more 
critical items while others have many less. 
Still others may have 35 different ones. The 
35 critical items reviewed here were derived 
from 198 rather arbitrarily selected items by 
a rather arbitrarily, though logically, selected 
means. 

It would seem that, in the evaluating of teach- 
ing, such minor aspects as where a teacher 
Stands or sits may tend to attract the attention 
of the rater, especially if the teacher is sit- 
ting when he thinks she should be standing. How- 
ever, it does not seem reasonable to believe 
that performing contrary to the rater’s expecta- 
tion on this issue would be sufficient reason to 
rate a teacher as poor. 

Other items such as organization and mas- 


GROTKE 


tery of subject matter, attention to individual 
differences, fairness to minority groups, and 
being a member of a professional group seem 
to be more significant items. Performing up to 
or surpassing the rater’s expectations on such 
issues is undoubtedly sufficient reason to be con- 
sidered good, while performing contrary to or 
short of expectations on these issues seems sul- 
ficient reason to be considered poor. If this is 
so, it is probably significant that teachers rated 
poor disagree with their raters more frequently 
than do A or B teachers on the evaluation of 
these teacher practices and on the importance 
of these teacher factors. It may be that pre- 
service and in-service education of teachers 
should place more emphasis on attitude forma- 
tion toward these factors and practices along 
with knowledge and skill concerning them. That 
these factors are important has been suggested 
by other studies. That lack of performance in 
these areas are considered cause for failure 
and dismissal has been shown by Buellesfield 
(9), Madsen (21), and Nemec (22), 

On the other hand, disagreements on relig- 
ious items, eulogizing attitudes toward the pro- 
fession, Americanism and economics, and per- 
sonal conduct of teachers tend to remind one that 
performing the role of teacher is much more 
than performing instructional duties, It is prob- 
able that teacher evaluations are based, in part, 
on how closely one’s behavior approximates the 
expectations of one’s rater on certain critical 
areas. That proper behavior in these areasare 
important has been suggested by the studies of 
Edminston (12), Buellesfield (9), and Madsen 
(21). It is likely that pre-service educationand 
orientational activities for beginning teachers 
may be modified to stress the broader aspects 
of the role of teacher. Further, it is possible 
that more compatible intra~staff relationships 
may be attained by selection and placement of 
personne! based on like beliefs on those issues 
found critical with the rater. 

Lastly, it would seem that frequency and di- 
vergency of disagreements on certain issues 
between the rater and ratee are a factor 
in the teacher's evaluation. Therefore, to ob- 
tain a more accurate rating from the profession- 
al distance point of view, professional distances 
between rater and ratee should be measuredand, 
if possible, kept constant, in the evaluation of 
teachers. 

Section Summary. —Using a modification of 
the causal-comparative research method, the 
three data gathering instruments were asalysed 
from three points of view. 

The first studied the hypothesis that profes- 
sional distance, i.e., frequency and divergency 
of disagreements, increased as ratings de- 
creased from good to average and ave rage to 
poor. It was found that the data present here 


37 


38 JOURNAL OF EXPERIMENTAL EDUCATION 


did not completely support such a hypothesis. 
Depending upon the specific method of analysis 
and the instrument, the number of schools in 
which A teachers are a shorter professional 
distance from their raters than are the Bor 
C teachers varied from 9 to 16 of the 30 schools 
studied. Likewise, the number of schools in 
which C teachers are further from their raters 
than are the A or B teachers varied from 8 to 
17. It was suggested that raters may accept 
alternate points of view on items known to be 
controversial without decreasing the teacher’s 
rating. Further, it was suggested that the man- 
ner of disagreeing may be an important factor 
along with the number of disagreements. Last- 
ly, it was suggested that some raters are more 
tolerant of conflicting points of view than others. 

The second approach studied the frequency 
of disagreements without regard to degree of 
divergencies. It was hypothesized that the 
teacher rated poor was the one who most fre- 
quently disagreed with the rater, while the 
teacher rated good most frequently agreed. The 
data seemed to indicate that frequency of disa- 
greement slightly increases as ratings increase 
from poor to average and average togood. How- 
ever, frequency of disagreement slightly de- 
crease as ratings decrease from good directly 
to poor. Such conclusions do not entirely con- 
tradict one another. It is suggested that teach- 
ers rated average are not necessarily between 
good and poor, a hypothesis suggested by other 
research, 

The third approach was an item analysis. It - 
was hypothesized that on certain items A teach- 
ers agree with their raters more frequently and 
with less divergence than do B or C teachers, 
and that on certain items C teachers disagree 
with their raters more frequently and withgreat- 
er divergence than do A or B teacher. Of the 
items studied, 23 were found tosupportthe 
A teacher aspect of the hypothesis, and12 were 
found to support the C teacher aspect. Of the 
23 critical items associated with A teacher and 
rater agreement, six concerned Teacher Prac- 
tices, 12 concerned beliefs related to education, 
and five concerned teacher factors. The twelve 
critical items associated with C teacher and rat- 
er disagreement were distributed equally be- 
tween teacher practices and beliefs related to 
education. None was concerned with the im- 
portance of teacher factors. 

It was suggested (1) that pre-service and in-~ 
service education of teachers increase the em- 
phasis given to attitude formation toward such 
critical issues as individual differences, or - 
ganization and mastery of subject matter, 
fairness to minority groups, and being a mem- 
ber of a professional staff, (2) that pre-service 
education and orientation of beginning teachers 
into the profession be modified to increase the 


(Vol. 24 


emphasis on the broader aspects of teaching as 
a way of life, to include such areas as the teach- 
er’s role on religious issues affecting the school, 
eulogizing attitudes toward the profession, Amer- 
icanism and economics, and personal conduct of 
teachers, (3) that, perhaps, more congenial in- 
tra-staff relationships may be attained by place- 
ment of personnel based on their points of view 
on critical issues, and (4) that professional dis- 
tance on critica! items must be considered in 
the evaluation of teachers to get more accurate 
ratings from the professional distance point of 
view. 


SECTION IV 


Summary and Conclusions 


THIS STUDY attempts to show the rela- 
tionship of professional distance to teacher rat- 
ings. Professional distance, adapted from the 
sociological term, social distance, is defined 
as the frequency and divergency between points 
of view held by professional workers on what 
constitutes the professional role of the good 
teacher. The greater the divergence and the 
more frequent the disagreements, the longer the 
professional distance; the lesser the divergence 
and fewer the disagreements, the shorter the 
professional distance. Professional role, adapt- 
ed from the sociological term, social role, is 
defined as the overt and covert behaviors re- 
quired of a person in a specific professional po- 
sition. One’s concept of the professional 
role of the good teacher, or aspects of it, serves 
as one’s criterion to evaluate teaching. The 
evaluation of one’s own teaching is an expres - 
sion denoting the difference between one’s con- 
cept of one’s performance and one’s concept of 
the professional role of the good teacher. The 
evaluation of another’s teaching is an expression 
denoting the difference between one’s concept of 
another’s teaching and one’s concept of the pro- 
fessional role of good teaching. If a teacher, 
who is approximating her own concept of teach- 
ing, is evaluated by a rater whose concept of 
good teaching is quite different from her’s, the 
rating is apt to be poor. If the same teacher is 
evaluated by a rater whose concept of good teach- 
ing is similar to that of the teacher’s, then the 
rating is apt to be good. Therefore, itis hypoth- 
esized that lengths of professional distance in- 
crease as ratings decrease from good to aver- 
age and average to poor. 

In reviewing researches in both education 
and sociology, the terms, professional dis- 
tance and professional role of the good teach- 
er, were not found. The concepts of social dis- 
tance in place of ‘‘favorable’’ and ‘‘unfavorable” 
attitudes toward minority groups was reported 
by Bogardus in 1928. Assigning weights to de- 


September, 1955) 


grees of social distance was reported by Dodds 
in 1935. In perhaps all educational research 
the presence of professional distance may be 
seen. The terms, traits, ratings, pupil change, 
test scores, and college grades, all imply spe- 
cific behaviors. When these are used as criter- 
ia for good teaching, the behaviors required to 
attain desirable scores, ratings, pupilchanges, 
etc. , are the behaviors required of persons who 
would perform the professional role of the rat- 
er’s concept of good teaching. Selected studies 
of the normative survey and correlational types 
were reviewed, and evidences of profes sional 
distances were pointed out. It was further sug- 
gested that correlations betWeen sets of teach- 
er ratings increase as the likelihood for profes- 
Sional distances decreases. 

The methodology of research used in this 
Study is a modification of the causal-compara- 
tive technique. The presence of the first phen- 
omenon under investigation was a teacher being 
considered a good teacher, and the absence of 
this phenomenon was a teacher being consider- 
ed average or poor. The second phenomenon 
was a teacher being considered a poor teacher, 
and its absence was a teacher being considered 
good or average. These definitions placed the 
average teacher in between the good and poor 
teachers. Frequencies and divergencies of dis- 
agreements with one’s rater on the overt andco- 
vert behaviors required of a person who per- 
forms the role of the good teacher were studied 
as the circumstances attendant to the presence 
of both phenomenon. 

In the application of the research technique, 
two communities were studied, From the first 
community 17 school faculties were selected. 
From the second 13 school faculties were select- 
ed. From each of the 30 faculties, the princi- 
pal served as the rater of his teachers. He, in 
turn, selected a teacher he considered good, a 
teacher he considered average, and a teacher 
he considered poor. The groups of teachers 
were named A teacher, B teachers, andC teach- 
ers for convenience. All subjects were drawn 
from elementary school levels. 

The measuring instruments for this study 
sought the points of view of the subjects on what 
overt and covert behaviors each required of the 
person playing the professional role of his good 
teacher. Three instruments were designed. 
The first consisted of 53 teacher practices 
which the subjects classified as ‘‘good’’, ‘‘poor’’, 
or ‘‘makes no difference’’, i.e., neither good 
nor poor. The second instrument was 120 state- 
ments of beliefs related to education. Subjects 
selected one of the following as their response: 
Yes, I definitely believe this statement; I am 
inclined to believe this statement; I cannot say; 
I am inclined not to believe this statement; and 
No, I definitely do not believe this statement. 


GROTKE 


The third instrument consisted of 25 teacher 
factors which subjects evaluated on a five-point 
scale from ‘‘of utmost importance’’ to ‘‘insignif- 
icant’, No claim was made that only these 198 
items make up one’s total concept of the profes- 
sional role of the good teacher. 

All raters and teachers responded to the in- 
struments. The responses of the teachers were 
compared with those of their rater. Differences 
between the resoonses were quantified. Those 
differences suggesting little divergence were 
weighted a small value. Those suggesting a 
greater divergence, or opposition, were as~- 
signed a larger value. 

Weights assigned to the differences were an- 
anlyzed from three points of view. The first 
considered professional distance, frequency and 
divergency of disagreements; the second consid- 
ered frequency without regard to divergencies; 
the third was an item analysis. Professional 
distance scores were computed for the disagree- 
ments between each rater and each teacher he 
rated by summing the assigned weights. For 
convenience, the professional distance scores 
were associated with the teacher ratings. It was 
found that the data did not completely support 
the hypothesis, stated above. Depending upon 
the specific method of analysis and the ins tru- 
ment, the number of schools in which A teach- 
ers are a shorter professional distance from 
their raters thanare the Bor C teachers 
varies from 9 to 16 of the 30 schools studied, 
Likewise, the number of schools in which C 
teachers are a longer professional distance from 
their raters than are the A and B teachers var- 
ies from 8 to 17. It was suggested that raters 
may accept alternate points of view on items 
known to be controversial without decreas ing 
the teacher’s rating. Secondly, itwas suggested 
that the manner of disagreeing may be animport- 
ant factor along with the number of disagree - 
ments. Lastly, it was suggested that some 
raters of teachers are more tolerant of conflict- 
ing points of view than others, 

The second analysis studied the frequencies 
of disagreements without regard to degree of di- 
vergence, It was hypothesized that A teachers 
disagree less frequently than do either B or C 
teachers, andC teachers disagree more fre- 
quently than do either A or B teachers. The da~- 
ta here presented seemed to indicate that fre- 
quency of disagreement slightly increases as rat- 
ings increased from poor to average and aver - 
age to good. However, frequency of disagree- 
ment slightly decreases as ratings dec reased 
from good directly to poor. Such conclusions 
do not necessarily contradict one another. It 
is suggested that teachers rated average are 
not necessarily between good and poor teachers. 

The third analysis studied the 198 items on 
the three instruments. It was hypothesized that 


40 JOURNAL OF EXPERIMENTAL EDUCATION 


on certain items A teachers agree with their 
raters more frequently and with less divergence 
than do either B or C teachers, and that C teach- 
ers disagree with their raters more frequently 
and with greater divergence than do either Aor 
B teachers, Twenty-three items were found to 
support the A teacher aspects of the hypothesis 
and twelve items supported the C teacher as~- 
pect. It was suggested that (1) pre-service and 
in-service education of teachers emphasize at- 
titude formation toward such critical items as 
individual differences, organization and mas- 
tery of subject maicer, fairness toward minor- 
ity groups, and being a member of a profession- 
al staff, (2) that pre-service and orientation 
activities for beginning teachers into the pro- 
fession increase the emphasis on the broader 
aspects of teaching, (3) that, perhaps, more 
congenial intra~staff relationships may be at- 
tained by placing personnel on the basis of their 
points of view on critical items, and (4) that 
professional distance on critical items must be 
considered in the evaluation of teachers in or- 
der to get more accurate ratings from the pro- 
fessional distance point of view. 


BIBLIOGRAPHY 


1. Almy, H. C. and Sorenson, H. ‘‘A Teach- 
er-Rating Scale of Determined Reliability 
and Validity, '’ Educational Administra- 
tion and Supervision, XVI (March 1930), 
pp. 179-86. 

2. Barr, A. 8. Characteristic Differences in 


the Teaching Performances of Good and 
Poor Teachers of the Social Studies 


(Bloomington, Ill.: Public School Publish- 
ing Co., 1929). 

. ‘*The Measurement and Predic- 
tion of Teaching Efficiency: A Summary 
of Investigations, Journal of Experi - 
mental Education, XVI (June 1948) pp. 
204-84, 

4. ,» and Emans, L. M. ‘‘What Qual- 
ities are Prerequisites to Success in 
Teaching? '’ Nation's Schools, VI (Sep- 
tember 1930), pp. 60-4. 

, and others. rvision; mo- 
cratic Leadership in Improvement of 
Learning, second edition (New York: D. 
Appleton-Century Co, , 1947). 


6. Bogardus, E. S. Immigration and Race At- 
titudes (Boston: D. C. Heath and Co., 


1928). 


7. Bogardus, E. 8S. ‘‘The Measurement of 
Social Distance, '’ in Readings in Social 
Psychology. T. M. Newcomb and E. L. 
Hartley, editors (New York: Henry Holt 
and Co., 1947). 

8. Bousfield, W. A. ‘Students’ Ratings of 


(Vol. 24 


Qualities Considered Desirable in College 
Professors, ’’ School and Society, LI(Feb- 
ruary 1940), pp. 253- 56. 

9. Buellesfield, H. ‘‘Causes of Failure Among 
Teachers, ’’ Educational Administration 
and Supervision, I (September 1915), pp. 
439-45. 

10, Cuber, J. F. Sociology: A Synopsis of 
Principles (New York: D. Appleton-Cen- 
tury Co., 1947). 

11. Dodd, S. C. ‘‘A Social Distance Test inthe 
Near East,’’ American Journal of Sociol- 
ogy, XLI (September 1935), pp. 194-204. 

12. Edmiston, R. W. and Cahill, C. M. ‘‘What 
Does the Rural Community Expect of its 
Teachers ?’’ Educational Administration 
and Supervision, XXVI (February 1940), 
pp. 98-102. 

. Encyclopedia of Educational Research, W. 
8S. Monroe, Editor (New York: Macmillan 
Co., 1950). Revised Edition. 

. Encyclopedia of Social Sciences, E. R. Se- 
ligman, Editor (New York: MacmillanCo., 
1934). 

. Good, C. V., and others. The Methodology 
of Educational Research (New York: D. 
Appleton-Century Co. , 1941). 

. Greenwood, E. Experimental Sociology 
(New York: King’s Crown Press, 1945). 

. Haggard, W. W. ‘‘Some Freshmen De- 
scribe the Desirable College Teacher, ’’ 
School and Society, LVIII (September 19- 
43), pp. 238-40. 

. Harris, C. W. ‘‘The Appraisal ofaSchgol 
Problems for Study, ’’ Journal of Edu- 
cational Research, XLI (November 1947), 
pp. 172-82. 

. Lamke, T. A. ‘‘Personality and Teaching 
Success, '’ Journal of Experimental Edu- 
cation, XX (December 1951), pp. 217-57. 

. Lamson, E. F. ‘‘Some College Students 
Describe the Desirable College 
Teacher, ’’ School and Society, LVI 
(December 1942), pp. 6-15. 

21. Madsen, I.N. ‘‘The Predictionof Teaching 
Success,’’ Educational Administration and 
Supervision, XIlI (January 1927), pp. 39-47. 

22. Nemec, L. G. ‘*Teacher Certification, ’’ 
Journal of Experimental Education, XV 
(September 1946), pp. 101-32. 

23. Newcomb, T. M. Social Psychology (New 
York: Dryden Press, 1950). 

24. Witty, P. A. ‘‘Evaluation of Studies of the 
Effective Teacher, ’’ in Improving Educa- 
tional Research, official report of the 
American Educational Research Associ- 
ation (Washington, D.C.,: American Ed- 
ucational Research Association, 1948). 

25. Wilson, L. and Kolb, W. Sociological An- 
alysis (New York: Harcourt, Brace and 
Co., 1949). 


September, 1955) GROTKE 41 


26. Woodworth, R. S. and Marquis, D. G. Psy- 27. Young, P. V. Scientific Social Surveys and 
chology (New York: Henry Holt and Co., Research, second edition (New York: 
1947). Prentice-Hall, 1949). 


‘ 

il 


AN INVESTIGATION OF THE NEW YORK 
STATE REGENTS EXAMINATIONS 
IN SCIENCE 


GEORGE GREISEN MALLINSON 
Western Michigan College of Education 
Kalamazoo, Michigan 


JACQUELINE V. BUCK 
Grosse Pointe Public Schools 
Grosse Pointe, Michigan 


Foreword 


An investigation as extensive as the one re- 
ported herein obviously is not the work of one 
man. It represents the coordinated efforts of a 
number of science educators who have devoted 
hours of work and personal finances to accom - 
plisha job that has long needed doing. Their re- 
wards for all practical purposes are intangibles, 
chiefly the satisfactions from jobs well done. To 
give adequate credit to these workers is impossi- 
ble. Suffice to say the list that follows contains the 
names of those to whom no adequate credit can 
ever be expressed verbally. Without their ef- 
forts, discussions about the New York State 
Regents Examinations in Science would fall 
purely into the realm of speculation and conjec- 
ture. The list follows: 


James L. Pellowe 
John J. Schmitt 
Fred J. Service 
Wayne A. Stafford 


Leo Alberti 

Jacqueline V. Buck 

Sidney V. DeBoer 

Dale A. Fuelling 

Lois M. Mallinson Harold E. Sturm 

David J. Miller Kenneth E. Summerer 
Richard G. Telfer 


Many other persons contributed time and 
effort in providing advice, criticisms and sug- 
gestions in various phases of the study. Among 
them are Mr. Wilton E. Baty, Chairman of the 
Committee on Regents Examinations, New York 
State Science Teachers Association; Mr. Hugh 
Templeton, Supervisor of Science Education of 
the University of the State of New York; Mr. 
Gordon E. Van Hooft, formerly President of the 
New York State Science Teachers Association; 
Dr. J. Cayce Morrison, formerly Coordinator 
of Research and Special Studies of the New York 
State Department of Education; Dr. Warren K. 
Findley, Director, Evaluation and Advisory Ser- 
vice, Educational Testing Service; Dr. Kenneth 
E. Anderson, Dean, School of Education of the 
University of Kansas; Dr. Francis D. Curtis, 
Professor-Emeritus of Education and of the 
Teaching of Science, University of Michigan; 


and Miss Agnes Hodahl, formerly New York 
State Representative of the National Science 
Teachers Association, Albany, New York, 


SECTION I 
THE EXPERIMENTAL DESIGN 


The Problem 


THE PROBLEM of this investigation is 
two-fold: (1) to investigate the attitudes of cer- 
tain science teachers from the State of New 
York toward the New York State Regents Exam- 
inations in Science, and (2) to analyze and eval- 
uate certain characteristics of the Regents Ex- 
aminations for Biology, Chemistry, Earth Sci- 
ence, and Physics prepared for the examination 
periods of January 25, 1949; June 21, 1949; Jan- 
vary 24, 1950; and June 20, 1950. 


Background of the Study 


On December 23, 1949, the director of this 
study met with Mr. Hugh Templeton, Supervisor 
of Science Education, of the University of the 
State of New York to discuss certain aspects of 
science teaching. During the course of the con- 
versation the work of a committee of the New 
York State Science Teachers Association con- 
cerning the attitudes of science teachers of New 
York State toward the Regents Examinations was 
discussed. The committee, under the chairman- 
ship of Mr. Wilton E. Baty, Huntington High 
School, Huntington, New York, had planned to 
poll a number of science teachers of New York 
State to obtain an objective analysis, for the first 
time, of their opinions of the Regents Examina- 
tions in Science. 

Many factors, among them time, personnel, 
and finances, made it impossible for the com- 
mittee to carry out its task. Hence, through 
the good offices of Mr. Templeton, the survey 
of opinions was delegated to the director of this 
study whose activities were still subject to the 
approval of Mr. Baty’s committee and Mr. 


44 JOURNAL OF EXPERIMENTAL EDUCATION 


Templeton. 

Suffice to say the survey was duly carried 
out by Mr. David J. Miller of Lakeview Junior 
High School, Battle Creek, Michigan, and was 
prepared as a report for his master’s thesis at 
the University of Michigan. With the approval 
of the University of the State of New York the 
report was subsequently published in an issue 
of Science Education, 1* 

During the months that followed the initial 
stages of Miller’s investigation, the director 
and Mr. Templeton met on a number of occa- 
sions to discuss the progress of the study. At 
one of these conferences, it was indicated that 
an investigation of the attitudes of the teachers 
was most desirable, but that it was unlikely that 
such an investigation would reveal the objective 
characteristics of the examinations. It was de- 
cided, therefore, that the director should pre- 
pare a research design that would provide a 
means for evaluating the objective characteris- 
tics usually evaluated in an examination as well 
as certain characteristics unique to the Regents 
Examinations in Science, 

After several weeks the director submitted 
a design to Mr. Templeton, The design was 
studied by Mr. Templeton and other members 
of the State Department of Education who were 
likely to be concerned. After modifications 
were made in light of criticisms and suggestions, 
it was decided that a sampling of the Regents 
Examinations for Biology, Chemistry, Earth 
Science and Physics should be item-a na lyzed 
and the following factors considered: 


1. A determination of the reliability, consis- 
tency and validity of the examinations. 

. A comparison between the scores obtained 
by students from small high schools and 
from large high schools. 

A comparison between the scores obtained 
by girls and boys. 

A determination of the levels of reading 
difficulty and vocabulary load of the var- 
ious examinations. 

An analysis of the types and frequencies 
of scoring errors made by the teachers 
who scored the examinations. 

An analysis of the various test items on 
the examinations in order to determine 
their popularity, difficulty and discrimin- 
atory power. 


The report that follows in Section Il deals 
with the factors just mentioned. 


“Footnotes will be found at the end of the article. 


(Vol. 24 


SECTION I 


SAMPLING THE REGENTS EXAMINATIONS IN 
SCIENCE AND TALLYING THE SCORES ON 
THE ITEMS 


The Problem 


THE PROBLEM of this phase of the in- 
vestigation is (1) to describe the procedure used 
in obtaining a representative sampling of Re- 
gent Examinations in Science for analysis, and 
(2) to describe the manner in which the scores 
on the examination items were talled and sum- 
marized, 


Obtaining a Representative Sampling of the Re- 
gents Examinations in Science 


In order to carry out the study it was neces- 
sary to obtain copies of the examination papers 
after they had been written by students in the 
State of New York. Copies of these papers were 
available since all the passing papers are for- 
warded to the State Department of Educationand 
are stored for one year pending a possible re- 
view of the scores. Through the cooperation of 
Mr. Peter Muirhead, of the University of the 
State of New York, permission was received to 
obtain the needed examination papers. It was 
agreed that the number and type of papers need- 
ed would be sent to the investigators provided 
that students’ names and locations .were kept con- 
fidential, and that, at the completion of the study, 
all papers would be destroyed. (Suffice to say 
the agreement was scrupulously followed and 
all papers were subsequently destroyed.) One 
weakness in this phase of the study is obvious. 
Only passing papers were available and hence 
the study does not deal with any analysis of pap- 
ers scored as failures. 

The problem of sampling the vast number of 
papers was a difficult one since an accurate an- 
alysis of the parameter of the student population 
was impossible. A number of conferences were 
held with statisticians of the State Department 
of Education, University of Michigan and West- 
ern Michigan College of Education. Four as- 
sumptions were deemed defensible as a basis for 


making the sampling: 


1. Complete bundles of papers turned in by 
schools should be selected instead of sampling 
individual papers. Thus considerations of cross- 
section of student population, and socio-econ- 


September, 1955) 


omic level would be met beyond reasonable 
doubt. 

2. The size of the bundles sent into the State 
are influenced by the size and type of school, 
as well as geographical location. In order to 
take these factors into account in the sampling 
of papers, the distribution of sizes of bundles 
selected for analysis should be the same as the 
distribution of sizes of thiose turned in to the 
State. 

3. The satisfaction of the first two assump- 
tions would depend on the fact that at least 1500 
papers for any one examination should be select- 
ed by a random selection of the proper sized 
bundles from the total group in storage. 

4. In order to assure that the analysis would 
be representative for a field of science, papers 
from four consecutive examinations s hould be 
analyzed. 

In the final study these considerations were 
met with the following modifications: 

1. It was decided to analyze the exam ina- 
tions in the areas of Biology, Chemistry, Earth 
Science and Physics for January 25, 1949; June 
21, 1949; January 24, 1950; and June 20, 1950. 
It will be noted that four consecutive examina- 
tions were not used. The examinations pre- 
pared for August were disregarded for two reas- 
ons: 

a) The persons taking Regents Examinations 

in Science in August are atypical of those 
taking them in January and June. Inor- 
dinate proportions consist of (1) poor stu- 
dents who failed the examination at one 
of the prior periods and are repeating the 
examination after further study insummer 
session, and (2) good students who are at- 
tempting to accelerate their high-school 
work via summer study. 
Frequently, 1500 examinations in eachof 
the various areas of science are not for- 
warded to the State in August, and hence 
assumption 3 could not have been met. 

2. It was decided to analyze approximately 
2000 rather than 1500 papers for each of the ex- 
aminations and periods in question in order to 
further insure a proper sampling. It was not 
possible to obtain 2000 examinations for Earth 
Science for any of the periods under considera- 
tion since in each case the total numbers of pa- 
pers were less than 2000. However, all those 
turned in were analyzed 

The process of sampling the papers was 
then undertaken through the joint efforts of the 
Examinations Bureau and the office of the Su- 
pervisor of Science Education. The papers 
were shipped at four separate periods, in each 
case one year after they had been forwarded to 
the State Department of Education. Table I 
lists the numbers of papers that were ana- 


lyzed. 


MALLINSON ~- BUCK 


45 


allying the Scores on Regents Examination 


in Science 


Any analysis of the examinations demanded, 
of course, a means for tabulating the points of 
credit obtained by the various students on the 
various items of the examinations. It was de- 
cided therefore to prepare a tally sheet that 
would be suitable for tallying the different ex - 
aminations. 

The Regents Examinations in Science are di- 
vided into two parts. Part I of all these exam- 
inations consists of fifty iiems of the modified 
true-false, completion and multiple-c hoice 
types. The following are examples: 


A. Modified true-false type: 


‘30. Light is transmitted through a vacuum. 
30 ” 


(For each correct statement, the word true 
is written on the line following the item. If the 
statement is incorrect the term that must be sub- 
stituted for the italicized term to make the state- 
ment correct is written on the line following the 
item. ) 


B. Completion type: 


‘5. An object with an excess of electrons is 
charged 


C. Multiple-Choice type: 


“14, The liquid which contracts when heated 


from 3° to 4° C. is (1) alcohol (2) ker- 
osene (3) mercury (4) water. 
Os OF 


All the items on Part I give one point credit 
and are scored either right or wrong with no par- 
tial credit. 

It was necessary therefore to develop a tally 
sheet on which the scores obtained by every stu- 
dent on every examination item could be tallied. 
Also later in the study it would be necessary to 
compute the coefficients of reliability of Part I 
of the examinations by means of the odds-evens 
technique. Hence it was decided to develop a 
tally sheet that would serve both these purposes. 
The section of the sheet for tallying Part I was 
divided so that the total number of points ob- 
tained on the even-numbered items could be 
computed separately from the total number ob - 
tained on the odd. Thus it was possible to ob- 
tain a score of twenty-five points on each of 
these halves. The numbers on the vertical or- 
dinate of the tally sheet designate the number of 
the item; those on the horizontal ordinate, des- 
ignate the various students whose pa pers were 


46 JOURNAL OF EXPERIMENTAL EDUCATION 


tallied, Each student retained the same nu- 
merical designation throughout the tally sheet. 

If an item on Part I were answered in error, 
a dot was placed in the proper square. When 
all items were so tallied, the total numbers of 
errors for the odd items, and for the even it- 
ems, were totaled. From these totals were 
computed the total scores obtained by the stu- 
dents on the odd items, even items andon both. 
It should be stated here that papers af all stu- 
dents receiving the same total score were tal- 

don the same sheet. Thus papers scored 
(were tallied on one sheet, those scored 65 

another, those scored 66 on another, andso 
on, The scores obtained on the odd and the 
even items, together with the totalscore on 
Part I, were then entered in the appropriate 
spaces at the top of the tally sheet. A sample 
tally is shown on the next page. 

The tally reads as follows: 

For Part I, Student 1 gave incorrect re- 
sponses to items 3, 21, and 37 for a score of 
22 on the odd items. He gave incorrect re- 
sponses to items 22 and 48 for a score of 23 
on the even items. His total score for Part I 
is 45. 

Part Il, however, offered different problems 
with respect to tallying. It consists of eight 
or nine essay~type items, each worth ten points, 
from which the student may elect five for a pos- 
sible total of fifty points. The items on Part II, 
however, vary from one another with respect 
to the numbers of parts they contain and the 
points of credit assigned to the parts. 

In taking Part II of the examination, the stu- 
dent completely rejects either three or four it- 
ems. It is also possible in some items toelect 
or to reject certain sub-parts. The former re- 
jections are referred to in this study as ‘‘com- 
plete rejects’’ and the latter as ‘‘partial re- 
jects. '’’ On the entire examination a total of 
one hundred points is possible if the student 
correctly answers all the items. 

A second part on the tally sheet was devel- 
oped to handle the various considerations in 
scoring Part Il. The blocks on the second part 
of the tally sheet are numbered 1-8 to corres- 
pond with the numbers of the items on Part II. 

In tallying Part II the numbers of the various 
sub-parts of the items were written in the left 
columns of the blocks. On this section, how- 
ever, the actual scores of the students on the 
various parts of the items are entered in the 
appropriate squares. The first row across 
each block is labelled ‘‘Comp. Reject.’’ A 
check is placed in this row if a student reject- 
ed an entire item. The spaces marked ‘‘Re- 
ject’’ are checked if a student rejected a sub- 
part of an item (if given a choice). In thecase 
of examinations having nine items, an addition- 
al block was affixed to the bottom of the sheet. 


(Vol. 24 


The total scores obtained by the students on the 
items they elected on Part II were then entered 
in the appropriate spaces at the top of the tally 
sheet. (See sample tally for Part I for space 
allotted to total score on Part II. ) 

A sample tally for Part II is shown on page 


The next task was to develop a means for 
summarizing the scores thus tallied so that they 
could be analyzed statistically. As a result 
three summary sheets were prepared. 

Summary Sheet I was a duplicate of the top sec- 
tion of the large tally sheet except for spaces on 
the lower right in which were computed the hor- 
izontal totals. It was from this sheet that the 
scores were taken for computing coefficients of 
reliability for Part 1, and for computing coeffic- 
ients of consistency between the scores obtained 
by the students on Part land Part Il. A separ- 
ate sheet was used for summarizing each tally 
sheet. 

Summary Sheet II was designed to summarize 
the total number of errors, the total number of 
correct answers, and the average scores for 
each of the items on PartI. A separate sheet 
was used for all papers receiving the same tot- 
al scores. 

Summary Sheet III was designed to summar- 
ize the items on Part Il. A separate sheet was 
used for all papers receiving the same total 
score. The following is an explanation of the 
meaning of the symbols in the blocks. 


PP = total possible points. Computed 
by multiplying the value of the sub- 
part by the number of persons who 
elected the sub-part. 


PE = total points earned. This was the 
total number of points received by 
the persons electing the sub-part. 


PM = total points missed. PP minus PE. 


PR = the number of persons who reject- 
ed certain sub-parts of an item if 
a choice was given. 


Av. Sc. = average score. This was obtained 
by dividing the total possible points 
(PP) by the number of persons who 
elected that sub-part of the item. 


(Percentage score: This was entered in the 
margin and was obtained by dividing the points 
earned (PE) by the total possible points (PP)) 


TA = total number of persons electing 
an entire item. 


TR = total number of persons rejecting 


- 
N 


PART | 


September, 1955) MALLINSON - BUCK 47 
ERROR TALLY 


48 JOURNAL OF EXPERIMENTAL EDUCATION 


PART II 


1 +++ + 4 4+ 
Reject 
as follows: Student 1 elected 
Points ] 
AgReject : item 1 having sub-parts A), Ap, 
| Points 
Reject ++ + 
Points 
» Reject 
Points 2 
Points 2 


The tally for Part II reads 


+++ ee 
le 


Bie Bos and Bae ie received 1 


+++? 


point on part A), and 2 points on 


each of the other parts. He 


Reject decided to reject completely item 


Pots 


2 having parts A), Ag» By, Bo» and 


Bz. ie elected item 3 haviny parts 


A, b&b, C, D, E and F and chose to 


+ 


2 ost os bs reject, as was his privilege, part 


+44 


C. He received one point of credit 


+ 
tt 


+ 


on Farts B and F, and 2 on each of 


Parte A, D and & for a total of 


8 points. 


+ 


+—4+-+ 


+++ 


—+—4- + + 


4 


+ +4-++4+ + 


> 
—+— 


—~+— 


3 


4 


TT 
—4 
4 


| 


(Vol. 24 
ject 
Comp. Reject | 
pyReject 
Points bol 
Points 
Points 
ject | 
Points 
3 
; 
B Reject 
 Pomts 
Reject = 
; 


September, 1955) 


an entire item. 


TA plus TR equals the total number of per- 
sons whose papers were summarized on the 
sheet. 


The nine blocks, of course, were designed 
for summarizing separately the scores obtained 
on the various sub-parts of the eight or nine 
items found on Part II of the examinations. In 
the left columns of the various blocks were en- 
tered the numbers and letters designating the 
various sub-parts of the items. 

The uses made of the computations found on 
the various Summary Sheets will be indicatedin 
later pages of this report. 

Sample Summary Sheets that are filled out 
are shown on the next three pages. (Note: A 
check mark below a grade denotes an error in 
the scoring of a student’s paper. ) 


SECTION III 


THE RELIABILITY, CONSISTENCY AND VAL- 
IDITY OF THE REGENTS EXAMINAT ION 
IN SCIENCE 


The Problem 


THE PROBLEM of this phase of the in- 
vestigation is (1) to describe the methods used 
in con, puting the reliability, consistency and 
validity of the Regents Examinations inScience, 
and (2) to report the results of these computa- 
tions. 


Methods Employed 


It was obvious that any measure of the reli- 
ability, consistency and validity of the Regents 
Examinations in Science would involve comput- 
ing coefficients of correlation. The device 
chosen for use in this study was the Pearson r, 
The scatter diagram technique cited by Guil- 
ford2 was employed for making the computa - 
tions. 

As will be described more fully later, a 
small number of classes were used, incertain 
computations, for grouping the data into inter- 
vals. Thus the estimates of correlation were 
lowered to some degree. It was decided there- 
fore to correct for errors in grouping, using 
data prepared by Peters and Van Voorhis, 3 In 


terms of a formula the correction is as follows: 


r 
Cy 


in which re is the corrected coefficient of cor- 
relation, r is the coefficient of correlation as 


MALLINSON - BUCK 


computed from the coarsely grouped data, and 
Cx and Cy are the correction factors based on 
the number of class intervals in X and Y respec- 
tively. The use of this correction seems justi- 
fied since the assumptions underlying its use 
were met. 


Computing Coefficients of Reliability 


There are three common methods for com- 
puting coefficients of reliability, namely, the 
split-half, alternate-form, and multiple-admin- 
istration. Since only one form of a Regents Ex- 
amination in Science is prepared, and that ad- 
ministered only once, the only method suitable 
for use in this study was the split-half. 

This method, however, could not be used in 
determining the reliability of an entire Regents 
Examination in Science. A casual survey of a 
sample examination indicates clearly that the 
split-half method is applicable to Part I only. 
Hence, it was decided to compute the reliability 
of Part I without regard for the scores on Part 
Il. 

The scores from Summary Sheets I were 
then transferred to a scatter diagram. Those 
the students obtained for the odd iten.s on Part 
I were tallied on the horizontal ordinate, those 
for the even, on the vertical, The tallies were 
made on the basis of two point intervals, namely 
6-7, 8-9, and so on up through 25 points. The 
point score of 6 was set as the lower limit since 
it was highly improbable that a lower score on 
either the odd or even items on Part I would 
have appeared on a passing examination paper. 

The coefficients of reliability were then con- 
puted and adjusted with the Spearn.an-Brown 
formula, and corrections were made for coarse 
grouping of data. Table LI lists the results. 


Con puting Coefficients of Consistency 


Since it was not possible to use the scores on 
Part II for computing coefficients of reliability, 
it was decided to compute coefficients of correl- 
ation between the total scores the students ob - 
tained on Part I and those they obtainedon Part 
I. It was assumed that such conm.putations (re- 
ferred to here as ‘‘coefficients of consistency’) 
might show the relationship between the abili- 
ties of students to answer correctly the ‘‘fac- 
tual’’ items on Part I and their abilities to ans~ 
wer the ‘‘thought’’ items on Part LI. 

The coefficients of consistency were comput- 
ed and corrected for coarse grouping of data. 
Table III lists the results, 


Conclusions 


Insofar as the techniques used in this phase 
of the investigation may be defensible, the fol- 


foes 
(¥eezz) Foszz 


| 02 | 6T 


—+-- —+ — 


| 12/02 | 
wt 


4 


LE SE|OF TH) OF) us se 

OT) atl at! ative! et| at tz | ce 02 | t2) 02 

—+ + 

IS OS GI SI LT OL GT HI tt or 6) 8 2 


oF 


+ 


> > 


8 
a 
a 
is 
2 
= 
2 
g 


I AEWTOS 


50 (Vol. 24 
| 
| | 
| 
| 
| rt] fel 
| Cl] 
| | | 
Isl || 
| | | 
| 2] lelele 
| =| | | 
| a| leleiaisis| | 
: 
=| lel | || 
3 fel | | 
fel | | 
fel | [ 
Bog fel || 
& & & 


T 


see 
os es 
ot 
L 


II 


September, 1955) MALLINSON - BUCK 51 
| 

| 
| 
Rl s| | 
| 
| 

Pale 
| 
_ Pala] 
el | 8 Lalo 


JOURNAL OF EXPERIMENTAL EDUCATION (Vol. 24 


SUMIARY SHEST IIT 
Logy REGENTS EXAMINATION FOR June 20, 


Summary of Part llof Tests Scored at 78 


PM PR AvSe. PP PE PM PR AvSe. PP PE PM PR Av Sc 


20 | 2041137167 | | | 26} 13.33 
Gab 7} so_| 79/63 | 15) 2,62 
| | +86] So | 76/67 | | 2.72 


34_| 78) 59 | 19! 1.51 


+—+—+-—+ 


| 


1390 + 714 8.18 
ira | 51 | = ta | | 21 


PP PE PM PR Av-Se. PP PE PM PR Av.Sc. 


-— 4—- 


‘TA | 38 TR | 
PP PE PM PR AvSe. 


160 | 13) 2 |1,57 
v0 {se | [as | 4 [1.36 
i 146 20, 4 11,64 | 
a5 | 2 | 


62 
+ + —- 
24 4 #2 
| 
t 
Sum 150 | | 20 Sum 
| 15 4: 
PP PE PM PR AvSe. 
+63 | 76 2 | 26} 76 (os | 13 
-80 [4b | 74 | 59 5b 44_ 41} 26 75} 6b 185 | 29 | 
[od | 96 1495 ___|+ 66 | 64 | +28 25) | 
|40 |tR| 20 | 22/TR| 30 
, PP PE PM PR AvSe. + PP PE PM PR Av.Se. 4 
+78 [7a | B04 6a| 74 63| 11] 1.70 |.88 
[7> | 76 | 24 |54 74 | 64) [2.73 ]-89 
ae 4 75 |  _|2,97]-82 
kat 
Sum| 200) 174) Sum|s70 [265/108 | [7.08] 
STA | 26 | 37|\TR 23 | 28 


September, 1955) MALLINSON - BUCK 


TABLE I 
NUMBERS OF EXAMINATIONS RECEIVED 


Field of Dates of -xaminations 
Seience Jan. 1949 | June ly49 | Jan. 1950 


Biology 1984 2095 
Chemistry 1960 1974 
“arth Science 1699 


Physics 2142 


Total 


TABLE II 


COEFFICIENTS OF RELIABILITY FOR PART I OF THE 
REGENTS EXAMINATIONS IN SCIENCE 


Date of Farth 


Examination Science Physics 


January 1949 «70 .02 = «02 
June 1949 = .03 
January 1950 


June 1950 


TABLE III 


COEFFICIENTS OF CONSISTENCY BETWEEN PARTS I AND I 
OF THE REGENTS EXAMINATIONS IN SCIENCE 


Date of Chemistry varth 


Examination Science Physics 


January 1949 045 Ol 
June 1949 
January 1950 
June 1950 


53 
2441 2092 R612 
1645 6587 
2278 2002 8104 

265 
66 
50 


54 


lowing conclusions seem valid: 


1. Most of the coefficients of relia bility 
found in Table II (ten of sixteen) were .75 or 
higher. These values are considerably higher 
than similar computations for teacher-made 
tests, at least insofar as the available research 
evidence indicates. The values, however, are 
somewhat lower than those usually found for co- 
efficients of reliability for standardized achieve- 
ment tests in science. 

2. It must be kept in mind that the coeffic- 
ients of reliability were computed for only the 
fifty points on Part I of the various tests rather 
than for the total one hundred points. Hadsome 
technique been available for including the points 
obtained on Part I, a higher degree of reliabil- 
ity might have been indicated. 

,3. The Regents Examinations in Science are 
prepared for three examination periods during 
the year, whereas a standardized examination 
remains essentially the same from year to year 
except for occasional revisions. Thus any crit- 
icisms of the reliability must be tempered in 
the light of this fact. 

4. The coefficients of consistency in Table 
If] for total ranges of scores were not general- 
ly as high as the coefficients of reliability found 
in Table Il. Thus there is no assurance that 
Part Land Part II have the same relative degree 
of difficulty for the students. 


Computing Coefficients of Validity 


Ordinarily the validity of a measuring instru- 
ment is determined by comparing the scores ob- 
tained on that instrument with criterion data 
for the factor being measured. In the case of 
this study only one measure for each student 
was available. Thus there was no possibility of 
comparing the scores the students obtained on 
the Regents Examinations in Science with the 
scores they obtained on any other measure. 
Hence, it was decided to use an internal criter- 
ion, 

The various examinations were submitted to 
at least five members of the National Associa- 
tion for Research in Science Teaching, who 
taught science education at the college or uni- 
versity level and who, at one time or another 
had worked on some phase of test c onstruction. 
They were asked to identify on Part II for each 
of the examinations, an item or part of an item 
that could be considered defensibly to be a meas- 
ure of each of the following objectives: 


a. Ability to apply scientific principles 

b. Possession of scientific attitudes 

c. Ability to employ problem-solving skills 
(elements of scientific method) 


JOURNAL OF EXPERIMENTAL EDUCATION 


(Vol. 24 


If a majority of the specialists identified a 
certain item or part of anitem, as being a 
measure of one of the objectives listed above, 
the scores on these items or parts thereof, were 
considered tentatively as being criterion data 
for these objectives. The items thus identified 
were then resubmitted to all the specialists who 
were asked to examine carefully the items (or 
parts) and to make their judgments as to wheth- 
er they could be considered defensibly as being 
measures of the objectives. Only those items 
considered suitable by four of the five judges 
were retained. In some cases the specialists 
failed to identify a suitable criterion item. In 
the case of these examinations computations 
for validity were omitted. 

Table IV lists the various items or parts of 
items that were identified as being suitable for 
the intended purpose. The scores obtained by 
the students on the total test were then plotted 
on the horizontal ordinate of the correlation 
chart and those obtained by the students on the 
criterion items were plotted on the vertical or- 
dinate. The computations for validity were then 
made and corrected in the same manner as those 
for consistency. Tables V, VI, and VII list the 
coefficients of validity thus computed. 


Conclusions 


Insofar as the techniques used in this phase 
of the investigation may be defensible, the fol- 
lowing conclusions seem valid: 


1. An examination of the Table VI indicates 
that for none of the examinations for Chemistry 
and Earth Science was an item or part of an it- 
em on the respective Part II considered to be a 
measure of the possession of scientific attitudes. 

2. Tables V through VII indicate that the co- 
efficients of validity for the various objectives 
differ greatly. In some cases they can be con- 
sidered high, in other cases, low. Thus it is 
difficult to generalize with respect to the valid- 
ity of the different Regents Examinations in 
Science. 

3. In general, one might state thatthe Re- 
gents Examinations in Science are better meas- 
ures of the ability to apply scientific principles, 
than to use elements of scientific method. The 
data for scientific attitudes are not sufficiently 
extensive to warrant a conclusion. 

4. As compared with the validity of teacher- 
made tests that of the Regents Examinations in 
Science is, in general, high. As comparedwith 
certain standardized tests the validity seems to 
rate rather favorably, although in some cases 
it seems low. 

5. It should be kept in mind that the methods 
for computing the coefficients of validity are not 


TABLE IV 


NUMBERS OF ITEMS, OR PARTS OF ITEMS, USED AS 
CRITERION DATA* 


Number of Item or Part of Item 


Understanding of 
Seientific 
Principles 


Possession of 
Soientifio 
Attitudes 


Ability to Use 
Elements of 
Scientific Method 


Biology 


Jan. 
June 
Jan. 
June 


Chemistry 


Jan. 
June 
Jan. 
June 


Earth 


Jan. 
June 
Jan. 
June 


Physios 


Jan. 
June 
Jan. 
June 


1949 
1949 
1950 
1950 


1949 
1949 
1950 
1950 


Science 


1949 
1950 
1950 


1949 
ly4y9 
1950 
1950 


“The dashes indicate that no item was judged as being suitable 
to serve as criterion data. 


September, 1955) MALLINSON - BUCK 55 
Examination 
and 
8 b, 7 
9 5 a), 2a 
4 50 1 
8 --- 3 
4 
6 ooo 
= 
5 
oo 4 
--- 2a 
6 24), 
8 
ooo 


JOURNAL OF EXPERIMENTAL EDUCATION 


TABLE V 


COEF FICIENTS OF VALIDITY (UNDERSTANDING OF 
SCIENTIFIC PRINCIPLES) 


Date Biology Chemistry Earth Soience 


No Adequate 
Criterion 


January 


June 


lv4y +20 


January 
1960 


June 
1960 


TABLE VI 
COEFFICIENTS OF VALIDITY (SCIENTIFIC ATTITUDES) 


Biology 


Chemistry 


Earth Soience 


Physics 


No Adequate 
Criterion 


+57 202 


- 50 


No Adequate 
Criterion 


No Adequate 
Criterion 


No Adequate 
Criterion 


No Adequate 
Criterion 


No Adequate 
Criterion 


No Adequate 
Criterion 


No Adequate 
Criterion 


No Adequate 
Criterion 


No Adequate 
Criterion 


-55 2 .01 


+56 .02 


No Adequate 
Criterion 


TABLE Vi 
COEF FICIENTS OF VALIDITY (ELEMENTS OF SCIENTIFIC METHOD) 


Chemistry 


Science 


Physics 


049 
2 
No Adequate 


Criterion 


61 .01 


No Adequate 


Criterion 


2 .ol 


56 (Vol. 24 
Physics 
.02 -40 .02 
| 2.01 2 2 .02 -63 
2 
January 
June 
lo4y 
January 
1950 .03 
June 
— corth 
January 
+41 Ol .02 
June 
1949 +46 02 +40 .02 
January 
1960 +58 2 .02 -61 2 .02 
June 
1960 -66 2 .02 © .02 


September, 1955) 


the ones ordinarily used. Hence, the above 
conclusions must be evaluated in terms 
of this fact. 


SECTION IV 


AN INVESTIGATION OF THE RELATIVE 
ACHIEVEMENTS OF MALES AND FEMALES 
AND OF RURAL AND URBANSTUDENTS, 
ON THE REGENTS EXAMINATIONS 

IN SCIENCE4 


The Problem 


THE PROBLEM of this phase of the in- 
vestigation is to determine whether achieve - 
ment on the Regents Examinations in Science 
(1) varies with the sex of the student, and (2) 
varies with the size of the school in which the 
student is enrolled. 


Methods Employed 


In practically every level of science educa- 
tion two questions arise frequently. The first 
deals with the relative achievements of boys and 
girls, and the second, with the relative achieve- 
ments of rural and urban students. The vast 
numbers of examination papers that were tal- 
lied in this investigation made possible a study 
of the questions. 

The first step in this phase of the investiga- 
tion was to tabulate, for each separate bundle 
of papers, the following information: 


1. The field of science for which the exam- 
ination was prepared 

2. The date for which the examination was 
prepared 

3. The name and location of the school from 
which the examination papers were re- 
ceived 

4. The population of the school for the school 
year in which the examination was pre- 
pared. (Note: It was decided to accept 
the school enrollment for the year 1948- 
49 as the base population, since it did not, 
in most cases, differ sufficiently {rom 
that of 1949-50 to make a distinction. ) 


The information for the first three points was 
found on the examination papers. Thatfor point 
four was found in the Forty-Sixth Annual Report 
on the Education Department for the School Year 
ending June 30, 1949, Volume 2, entitled Statis- 
tics (Albany: University of the State of New 
York, 1950, pp. 353). The information for the 
populations of high schools in the City of New 
York was not listed in this publication. It was, 
however, obtained from the Supervisor of Sci- 


MALLINSON - BUCK 


57 


ence Education of the State Department of Edu- 
cauion, 

The first step was to tabulate separately for. 
males and females the scores made on Part 1, 
Part II and on the total examination, This task 
was relatively simple. 

However, it was more difficult to classify a 
school as being urban or ruralin character. 
Hence, rather than to tally scores according 
to this method of classification it was decided to 
tally them according to the size of the school in 
which the students were enrolled. For the pur- 
pose of this investigation the Southern Michigan 
classification for size of high school was used, 
As set up in the Handbook of the Michigan High 
School Athletic Association for the School Year 
of 1952-53 the classification is as follows: 


1. Class A - over 800 students 

2. Class B - 325 to 799 students 

3. Class C - 150 to 324 students 

4. Class D - less than 150 students 


This classification, however, did not prove 
to be completely satisfactory, It did not seem 
reasonable to classify the students of a school 
with an enrollment of 1000 with those from a 
large New York City high school, suchas James 
Madison School with a population of over 6000, 
Therefore an additional classification was add- 
ed, namely AA, or schools with a population of 
over 1500. 

A copy of the sheet on which the scores were 
tabulated is shown on the next page. 

In order to determine the significance of the 
variances that might exist among the scores on 
the basis of sex and class of school, it was de- 
cided to use the analysis of variance technique 
with the double entry table described by Lind- 
quist.5 However, there was a great inequality 
in replications since papers were sampledfrom 
those contributed by the various classes of 
schools in the same proportions as the various 
classes contributed papers to the total number 
sent into the state. As a result the factor of non- 
orthogonality was present in the design. The 
procedure for the final ‘‘F’’ test was the one 
suggested by Snedecor6 for use with two-wa y 
classifications with unequal replications in which 
corrections are made for non-orthogonality. 

A copy of the analysis sheet used for this pur- 
pose is shown on page 59. 

Table VII presents the results of the compu- 
tations just described. 

It may be noted that in several cases the var- 
iance with respect to interaction are significant. 
These occur on Parts I and II, and total score of 
the Biology Examination for June 1950; on Parts 
I and II of the Chemistry Examination for Janu- 
ary 1949; on Part II of the Chemistry Examina~- 
tion for June 1949; and on Parts I and II, and 


JOURNAL OF EXPERIMENTAL EDUCATION 


NAME OF EXAMINATION DATE OP EXAMINATION 


NAME OF 6CHOOLs POPULATION: 
1948-49 


1949-50 


MALE 


Student Part II Student 


58 ee (Vol. 24 


MALLINSON - BUCK 


ANALYSIS OF VARTANCS (Two-way classification) 


Examination Date Part(s) 


Tirow) | '(row) 


Girls 


T(eol) 


(soi fi) 


(Correction) = 
. 
ls 
( 38g) 


Summary Table 


Source of Preliminary | Corrected 
eum of sum of 
squares squares 


variance 


Sex 

Class 
Interaction 
Sub-total 
Within 
Total 


September, 1955) Pe 59 
Class of 
School 
rx? 
vr 
« rx? 
72 

rx’ = rx’? = 

B- 

1% 


JOURNAL OF EXPERIMENTAL EDUCATION 


TABLE VI 
ANALYSES OF VARIANCES 


—--- 


Interpretation 


Variance Significance | High 


Biology, 
January, 1949 sig. 
clase sig. 
interaction 40 not sig. 


sex not eig. 
class very sig. 
interaction not sig. 


sox sig. 
class very sig. 
interaction not sig. 


sex not sig. 
class very sig. 
interaction not sig. 


sig. 
class very sig. 
interaction not sig. 


eex not sig. 
class very sig. 
interaction not sig. 


Biology, 
January 1950 sex very sig. 
class very sig. 
interaction not sig. 


not sig. 
class very sig. 
interaction not sig. 


sex not sig. 
olase very sig. 
interaction not eig. 


60 (Vol. 24 
Lew 
Sen 
boys girls 
AA B 
boys girls 
AA D 
Biology, 
June 1949 oo oo 
AA 
girls boys 
AA A 
AA A 
cirle bo 
iA 
II eo 
: AA D 
Tall 
AA D 


September, 1955) 


MALLINSON - BUCK 


TABLE VIL (Continued) 


examination 


Source 
of 
Variance 


Interpretation 


Significance} High 


biology, 
June 1950 


Chemistry, 
January 1949 


Chemistry, 
June 1949 


gex 
class 
interaction 


sex 
class 
interaction 


class 
interaction 


sex 
class 
interaction 


class 
interaction 


sex 
class 
interastioan 


sex 
class 
interaction 


sex 
class 
interaction 


sex 
class 
intersection 


sige 
very 
very 


sir. 
sig. 


siz. 
very 
sig. 


very siz. 
very size 
sig. 


very siz. 
very sig. 
sig. 


very sig. 
very sif. 


not sige 


very sif. 
very sig. 
not sig. 


sig. 
very sig. 
sig. 


not sig. 
very sig. 
not sig. 


61 
Low 
I 7.10 boys girls 
47.2 AA A 
7.31 sig. -- oo 
It 4.6 hoys girls 
4.19 -- oo 
I Il 6.7 boys rirls 
24.5 | AA D 
6.11 -- 
327 boys | gtris 
3.86 -- -- 
Il 8.67 boys girls 
2.44 oo wo 
23.0 boys girls 
"| 
1.5 os 
4033 boys | pints 
1.0 -- -- 
II 4.6 girls boys 
4.59 eo on 
202 -- oo 
1.99 
x 


JOURNAL OF EXPERIMENTAL EDUCATION 


TABLE VIII (Continued) 


source Interpretati 
Examination |Part(s) of 
Variance Significance High 


Chemistry, 
January 1950 sex very sig. 
class very sig. 
interaction not sig. 


sex not sig. 
class ) very sig. 
irteraction 3.5% sig. 


sex sig. 
class very sig. 
interaction not sige. 


chemistry, 
June sex very sig. 
class : eiz. 
interaction sige 


som not sig. 
class sig. 
interaction sig. 


Geox very sig. 
class 5. 23 sig. 
interaction sic. 


Earth 
Sotence, 

January 1949 sox not sige 

class very sig. 


interactior ° not sige 


sex not sig. 
class not sige 
inceractior not sig. 


sex 4 not sig. 
cless not sig. 
interaction not sig. 


62 (Vol. 24 
3” 
boys girls 
AA D 
II 
AA D 
I boys girls 
AA D 
boys girls 
c D 
I! 
b C 
boys girls 
AA A 
A c 
TT 


TABLE VII (Continued) 


Sowce 
of 
Variance 


Interpretation 


Signiftoance High 


Earth 
Science, 
June 1949 


Earth 
Science, 
January 1950 


« 


' 


sex 
class 
interaction 


sex 
class 
interaction 


sex 
class 
interaction 


sex 
class 
interaction 


sex 
class 
interaction 


gex 
class 
interaction 


sex 
class 
interaction 


sex 
class 
interaction 


class 
interaction 


very sig. 
very sir. 
not sig. 


very sig. 
not sig. 
not sig. 


very sig. 
sig. 
not sig. 


not sig. 
not sig. 
not sig. 


not sig. 
not sig. 
not sig. 


September, 1955) MALLINSON - BUCK 63 
Examination | Part(s) 
Low 
I 17.0 boys girls 
2.26 oo eo 
I’ 17.2 boys girls 
1.25 oo 
I 1.56 -- 
1.14 -- eo 
vory sig. girls boys 
94 not ee 
1.34 not sig. -- o- 
Tall 65 not sig. -- -- 
1.60 not -- 
«70 not sig. -- -- 
Earth 
Science, 
June 1950 I 1.42 not sig- -- -- 
1.25 not sig. -- -- 
not sig. -- 
not sig. -- 
P| 6.16 very sig. A B 
1.04 not sig. -- -- 
I & II 
41 
H 


JOURNAL OF EXPERIMENTAL EDUCATION 


TABLE VII (Continued) 


Source Interpretation 


~xamination of 
Variance Significance High 


Physics, 

January 1949 sex very sig. boys 
class very sig. AA 

interaction not sig. 


not sig. 
class sig. 
interaction not sig. 


very sig. 
clase very sig. 
interaction not sig. 


Physics, 
June 1949 sex not sir. 
class very sig. 
interaction not sig. 


not sig. 
class sig. 
interaction not sig. 


sex not sig. 
class very sig. 
interaction not sig. 


Physics, 
January 1950 sex very sig. 
class very sig. 


interaction not sig. 


sox very sig. 
class sig. 
interaction not sig. 


sex very siz. 
class very sig. 
interaction not sig. 


(Vol. 24 


64 
girls 
B 
AA A 
Tail boys girls 
AA A 
AA c 
AA c 
AA c 
boys girls 
AA D 
IT boys girls 
AA B 
boys | girls 
AA D 


MALLINSON - BUCK 


TABLE VII (Continued) 


Interpretation 
Significance High 


Physics, 
June 1950 not sig. 
sig. 

not sig. 


not oig. 
sig. 

interaction not sig. 
sex not sig. 


class sig. 
interaction not sig. 


September, 1955) es 65 
Source 
Examination of 
Variance lew 
c 
c 
- 

| 
| 


66 JOURNAL OF EXPERIMENTAL EDUCATION 


total score of the Chemistry Examination for 
June 1950. 

In these cases it is possible that differences 
in curriculum, and the organization and admin- 
istration of the schools are the causes of observ- 
able variances, rather than the factors of size 
of school and sex. 

However, in the cases where the observed 
variances could be attributed reasonably to sex 
or size of school, the following observations 
seem defensible: 


1. On the various Biology Examinations, 
boys are significantly better than girls on two 
occasions, and girls are significantly better 
than boys on two occasions. Students from AA 
schools prove to be superior on nine occasions, 
while students from class D schools prove tobe 
the lowest on five occasions, from class A on 
three occasions, and from class B on one occa- 
sion. 

2. On the various Chemistry Examinations, 
boys are significantly better than girls on four 
occasions, while in no case did the girls prove 
to be significantly better than the boys. Stu- 
dents from the AA schools prove to be superior 
in five cases, while students {from class B 
schools are lowest in one case, from class C 
in two cases, and from class D in two cases. 

3. On the various Earth Science Examina- 
tions, boys are superior in three cases, while 
girls are superior in one, Students from the 
AA and A schools each prove to be superior in 
one case, and students from class B schools in 
two cases, Students from class C schools are 
lowest in two cases, and those from class B 
schools, once. 

4. On the Physics Examinations, boys are 
superior in five cases, while in no case are 
girls superior. Students from the class AA 
schools are superior in eight cases, and those 
from class A schools, in three cases. Stu- 
dents from class A schools are lowest once, 
from class B twice, class C six times, and 
class D twice. 

5. It may be stated, therefore, that inthirty- 
one out of forty-eight cases there is no variance 
attributable to the sex factor. However, out of 
the remaining seventeen cases the variances 
are significantly in favor of the boys infourteen, 
and significantly in favor of the girls in three. 

6. It may be stated that in eighteen cases 
there are no variances attributable to size of 
school. In the remaining thirty cases the var- 
iances are significantly in favor of students 
from class AA schools in twenty-three, while 
in no case are they in favor of students {rom 
class C or D. In nineteen of thirty cases stu- 
dents from class C and D schools appeared to 
exhibit less achievement, at least insofar as 
the variances may be criteria. 


(Vol. 24 


In conclusion, boys from the large high 
schools appear to score significantly higher on 
the Regents Examinations in Science than any 
other single group, while girls from small high 
schools appear to exhibit less achievement than 
any other single group. 


SECTION V 


THE VOCABULARY LOAD AND LEVEL OF 
READING DIFFICULTY OF THE REGENTS 
EXAMINATIONS IN SCIENCE? 


The Problem 


THE PROBLEM of this phase of the in- 
vestigation is to evaluate the Regents Examina- 
tions in Science with respect to their vocabulary 
loads and levels of reading difficulty. 


Methods Employed 


The first step was to find a technique for eval- 
uating the vocabulary load of the Regents Exam- 
inations in Science. The Flesch8 formula (as 
well as other reading formulae) was obviously 
not suitable for the intended purpose since it is 
used ordinarily with passages of at least one 
hundred words or more and involves complete 
sentences rather than the type of material found 
on the Regents Examinations. Some of the ex- 
aminations, for example, contain completion 
and multiple-choice items that do notadapt them- 
selves readily to the use of the Flesch formula. 
Therefore it was decided to use the word-count 
method. 

The first step was to tally all the words that 
appeared on the sixteen examinations. The words 
in the directions for writing the examinations 
however were not tallied, nor were numbers un- 
less they appeared as words. Empirical formu- 
las and structural formulas were not tallied. 
All other words, including those found on charts 
and diagrams were tallied. 

Next, the words thus tallied were classified 
into two broad categories: (1) technical words, 
and (2) ndn-technical words. The technical cat- 
egory was further sub-divided into two classifi- 
cations: (a) essential, and (b) desirable. The 
non-technical words were also divided into two 
classifications: (a) difficult, and (b) easy. 
These categories and classifications were es - 
tablished as follows: Letters were sent to six- 
ty teachers who taught in each of the areas of 
Biology, Chemistry, Earth Science and Phys- 
ics in the State of New York, asking if they 
would be willing to evaluate lists of 
words in their respective teaching fields. A 
copy of this letter follows: 


September, 1955) 


October 8, 1952 
Dear 


At the present time the University of the 
State of New York, through its science, re- 
search and statistical divisions, is undertaking 
an extensive investigation of the New York State 
Regents Examinations in Science. One of the 
facets of the investigation is to determine wheth- 
er or not the vocabulary load on the exami na - 
tions may be excessive. 

According to the University, you have atone 
time or another taught Earth Science* and have 
administered and scored Regents Examinations 
in thatarea. Hence, we have a request tomake 
of you. Would you be willing to evaluate a list 
of terms found on four representative Regents 
Examinations in Earth Science? If so, I would 
appreciate receiving an indication of your will- 
ingness ona postcard. If you agree, we shall 
do the following: 

1. Send you a copy of the list of terms to- 
gether with instructions and a self-addressed 
envelope in which to return your evaluation. 

2. Give you full credit for your work in the 
final report as well as informing your adminis- 
trator and board of education of your efforts. 

The job will take about one hour and should 
be completed one week after receiving the ma- 
terial. We sincerely hope that you will have 
time to help us out. 


Sincerely, 


George G. Mallinson, Director 
Evaluation Program for 
Science Regents 
lm 


(*Note; The area of science named in the letter 
depended on the word list the individual was re- 
quested to evaluate. ) 


To this request, many teachers indicated 
their assent. They were then senta mimeo- 
graphed copy of all the terms that appeared on 
the examinations in their respective teac hing 
fields. They were asked to evaluate each word 
on the list according to instructions found inan 
accompanying letter, a copy of which follows: 


Dear 


Your recent communication indicated that 
you would be willing to assist in evaluating the 
vocabulary content of the New York State Re- 
gents Examinations in Physics.* Your cooper- 
ation is more than appreciated. 

Enclosed you will find a list of ‘‘Terms for 


MALLINSON - BUCK 


Physics’’ together with a stamped envelope in 
which to return your evaluation, It is not neces- 
sary to sign your name. The following are the 
information and instructions for making the eval- 
uation: 

1. The list of terms consists of all the words 
and terms that appeared on the four successive 
Regents Examinations in Physics for January 
1949, June 1949, January 1950 and June 1950. 

2. The terms may be divided into two cate- 
gories (a) non-technical, and (b) technical. 

(a) Non-technical terms are those that a per- 
son is likely to use at one time or another 
in his everyday conversation, or read in 
the newspaper or other literature not con- 
cerned specifically with physics. 

(b) Technical terms are those that a student 
would encounter specifically in a course 
in Physics. While such terms might be 
encountered in other courses or other 
places, an adequate understanding of the 
usual topics and principles of a typical 
course in physics would demand an under- 
standing of, and the ability to use and ap- 
ply, them. 

3. Please examine the list of terms one by 
one. If you think a term fits the definition of 
‘‘technical term’’, place an asterisk (*) before 
it. If you do not think the term fits the defini- 
tion, ignore it. Reexamine your list to see if 
your judgment is consistent. 

4. Then examine all the terms before which 
you placed an asterisk (*). If you believe that 
such a term is absolutely essential toanadequate 
understanding of topics and principles found in 
a typical course in physics, place a second as- 
terisk (**) before the term. If however you be- 
lieve the term to be merely a desirable techni- 
cal term, leave it marked with but one aster - 
isk (*). 

Again let me say that your cooperation is ab- 
solutely essential and more than appreciated. 

In the final report due credit will be given your 
efforts and your administrator and board of ed- 
ucation will be notified. 

Your evaluation will be appreciated as soon 
as convenient and a copy of the final report will 
be sent you if you so request. 


Sincerely, 


George G. Mallinson, Director 
Program of Evaluation 
New York State Regents Exam-~ 
inations in Science 
lm 
Enc. 2 


(*Note: The area of science named in the letter 
depended on the word list the individual agreed 
to evaluate. ) 


JOURNAL OF EXPERIMENTAL EDUCATION 


After a period of about four weeks, 23 Biol- 
ogy, 32 Chemistry, 24 Earth Science, and 24 
Physics lists were returned, These were then 
tallied on a ‘‘master tally list’’ for each of the 
subjects. 

lf a word was checked ‘‘essential’’ by a to- 
tal of ten or more respondents it was consider- 
ed to be an ‘‘essential’’ term. (For example, 
the word ‘‘atomic’’ appearing on the list of 
Physics terms was checked ‘‘desirable’’ by five 
Physics teachers and ‘‘essential’’ by eleven. ) 
However, if a word was checked ‘‘essential’’ or 
‘‘desirable’’ by a total of five or more teachers 
(but checked ‘‘essential’’ by less than ten) it 
was considered ‘‘desirable,'' (For example the 
word “‘atmosphere’’ appearing on the list of 
Physics terms was checked ‘‘desirable’’ by sev- 
en Physics teachers, and ‘‘essential’’ by five. 
Therefore it was considered ‘‘desirable. '') 

All words rated as being ‘‘essential’’ or ‘‘de- 
Sirable’’ were considered to be part of the tech- 
nical vocabulary and hence were not deemed to 
be difficult. The remaining words, not rated 
as being part of the technical vocabulary, were 
considered to be non-technical terms, and there- 
fore words which a student might find difficult. 
These non-technical words were then checked 
by means of the Buckingham-Dolch9 word list 
in order to determine their grade-levels of dif- 
ficulty, It was assumed that the courses in sci- 
ence would be taken by some students at these 
grade levels; Biology, ninth grade; Earth Sci- 
ence, tenth grade; Chemistry, eleventh grade; 
and Physics, eleventh grade. Any non-techni- 
cal word was considered to be difficult the re- 
fore if it was rated above these respective 
grade levels in the word list. 

Nontechnical words not appearing in the 
Buckingham-Dolch list were also considered to 
be difficult, 

Table LX lists the numbers of words on the 
various Regents Examinations in Science that 
fall into the various categories mentioned. 


Conclusions 


No listing will be made here of the different 
words falling into the various categories. How- 
ever, insofar as the techniques used in this 
study may be valid, the following conclusions 
seem justified; 


1, The greatest number of technical words 
(271) was found on the June 1950 examination 
in Biology, the fewest number (213) onthe June 
1949 examination in Physics. Thus the num- 
bers of technical words on the different exam- 
inations does not vary greatly. Further itdoes 
not seem likely that the vocabulary load with 
respect to technical words is likely to be ex- 
cessive. 


(Vol. 24 


2. The greatest number of difficult non-tech- 
nical words (8) was found on the June 1950 ex - 
amination in Biology, while there were no diffi- 
cult non-technical words on the Chemistry exam- 
inations for January and June 1949. Hence it is 
rather unlikely that the numbers of difficult non- 
technical words are excessive. 

3. The findings just indicated fail to show that 
there is any justification for criticizing the Re- 
gents Examinations in Science on the basis of 
their vocabulary loads and hence their levels of 
reading difficulty. 


SECTION VI 


ERRORS AND INCONSISTENCIES IN SCORING 
THE REGENTS EXAMINATIONS IN SCIENCE 


The Problem 


THE QUESTIONNAIRE that was sent to 
the science teachers in the initial stages of this 
investigation revealed that about two-thirds of 
them believed that a sampling of the examina - 
tion papers should be rechecked after they were 
turned in to the State. This would seem to indi- 
cate that the teachers believed that there might 
be some errors and inconsistencies in the scor- 
ing of the examinations. Hence, it is the prob- 
lem of this phase of the investigation (1) to de - 
termine whether or not the belief is valid; andif 
so, (2) to determine the types and frequencies 
of scoring errors that appear in the papers ana- 
lyzed in this study. 


Methods Employed 


While tallying the scores that the students re- 
ceived on the various items on the examinations, 
the investigators recorded the obvious errors 
and inconsistencies that appeared in scoring 
them. The resulting lists were then studied. 
In general, it was found that the errors and in- 
consistencies could be classified into these four 
major categories: 


. Errors in the addition of points of credit 

. Errors resulting from failure to follow 
State-prescribed scoring procedures 

. Inconsistencies and errors in correction 

. Miscellaneous 


1. Errors in the Addition of Points. —This 
category of error and inconsistency was by far 
the most extensive. There were severalstages 
in the scoring of a paper where such errors in 
addition could occur, namely, 

a) Errors in adding the Part I and Part II 


September, 1955) MALLINSON - BUCK 


TABLE [x 
NUMBERS OF WORDS BELONGING IN DIFFERENT CLASSIFICATIONS 


Number of Differmt Wumber of Different | Total 
Technical Words Non-Technical Worde Number of 

Total Pifferent 
Desirable},  |pirrioult| Non- Words 


Science Technical Technical 


Biology 143 264 


Chemistry 


Barth 


Science 


Physics 


69 
June 1949 134 129 263 415] 419 662 
Jan. 1950 lig 135 254 367 621 
June 1950 139 132 271 385] 393 664 
ME Jean. 1949 171 73 244 184] 184 428 
June 1949 183 59 242 190} 190 432 
Jan. 1950 188 67 255 190} 191 446 
June 1950 1% 72 246 199] 200 446 
a Jan. 1949 110 126 236 217| 220 456 
June 1949 110, 145 255 238 | 241 496 
Jan. 1950 110 130 240 217| 223 463 ; 
June 1950 97 145 242 228] 233. 476 
ivan. 1949 108 130 238 3i9 | 325 863 
June 1949 105 108 213 323] 329 542 . 
Jan. 1950 101 133 234 333 | 337 671 
June 1950 lll 129 240 327 | 331 571 


JOURNAL OF EXPERIMENTAL EDUCATION 


scores to obtain the total test score. 
Errors in totaling the PartI score. There 
were two major chances for error here; 
a mistake could be made (1) in totaling 
the number of points to be deducted be - 
cause of a student's failure to answer it~ 
ems correctly; and (2) in subtracting this 
number of points from the maximum point 
value of fifty for Part L. 

Errors in totaling the Part I] score. In 
this case there were several ways in 
which the errors could occur. At times 
simple errors could be made in totaling 
the scores on the parts of items. Atother 
times errors in subtraction resulted when 
the points to be deducted because of error 
were totaled and subtracted from ten (the 
maximum point value for each individual 
Part I item). In still other cases the 
correction marks of the teacher were so 
light or illegible that they were apparent- 
ly overlooked when the points were to- 
taled. (This latter type of error would, 
of course, have been avoided if cumula- 
tive scores had been kept, as suggested 
by the State. ) 


Certain other errors were made in totaling 
Part Il scores because of the failure to follow 
correct scoring procedures. Such cases will 
be discussed in the next two parts of this sec- 
tion, 

It is interesting to note that the greatest 
number of errors in making totals accrued to 
the benefit of the student, that is, the total 
score awarded the paper was higher than the 
correct total. For example, out of 2011 
Biology Examinations for January 1949, 
ninety-two errors in making totals were detect- 
ed, Of these, only six scores were lower than 
they should have been; the rest were higher. 

The tables that follow indicate the frequency 
with which the various types of errors just de- 
scribed occurred, Since the approximate per- 
centages of errors were the same for all six- 
teen sets of examinations, data are includedfor 
only one examination in each of the four areas 
of science. 

2. Errors Resulting from Failure to Follow 
State-Prescribed Scoring Procedures. —The 
State of New York issues a manual!” listing the 
procedures to be followed in scoring Regents 
Examinations. Many of thése suggested proced- 
ures were violated by a number of teachers, 
and these violations in many instances led to 
incorrect examination scores. Examples of 
these types of errors follow: 

a) Failure to keep a cumulative score. The 
State suggests that for each item or part of an 
item the points awarded should be indicated on 
the test paper and a cumulative positive score 


(Vol. 24 


be kept throughout the paper This means that 
the points awarded for answering correctly 
each item or part of an item be totaled continu- 
ously as the paper is scored. Hence at the com- 
pletion of the last item the resulting cumulative 
score will represent the total score of the paper. 

By far the greatest number of teachers tal - 
lied the points deducted rather than the points 
awarded for the answers to items or parts of it- 
ems. Among those who did indicate the points 
awarded, many failed to keep cumulative 
scores. Obviously the practice of indicating the 
number of points deducted is more subject toer- 
ror than the method recommended by the State. 
The teacher may make a mistake in totaling the 
points to be deducted and make another in s ub- 
tracting this total from the maximum point val- 
ue allotted the item or part of an item. For ex- 
ample, such errors occurred on 117 out of 1699 
papers in Earth Science for June 1949. 

b) Failure to score items in sequence. On 
Part II of the examinations a student has the op- 
tion of selecting five out of a possible eight or 
nine items. On some of these items he may 
omit one or two parts of the item. Occasionally 
all eight or nine items or allparts of a single it- 
em were answered by a student. Incertaincases 
if one item (or part of an item) that appeared in 
the middle was answered poorly or incorrectly, 
some teachers skipped it and gave credit for the 
later sections that were answered more correct- 
ly. 

The State requires that the items or parts of 
items should be scored in order of appearance, 
omitting the last item or part. Failure to doso, 
of course, may give a student a higher score 
than he deserves. 

c) Scoring of papers by several different 
teachers. The State suggests that one teacher 
should score all items on any given examination 
paper. However, in many cases, particularly 
in larger schools, it was obvious that teachers 
worked together on scoring a group of papers. 
For example, one teacher might score items 
one through ten on a group of papers; another 
teacher items eleven through twenty. Sucha pro- 
cedure often lead to inconsistencies and errors 
when the total score of a paper computed. 

d) Counting scores below 62 as passing. The 
cutting score for passing papers has been set by 
the State at 65. However, recognizing that er- 
rors may occur in the correction of papers, a 
three percent correctionerror has beenallowed. 
Thus scores from 62 to 64 are considered as 
‘‘passing’’. These ‘‘below level’’ scores of 62, 
63 and 64 are recorded as In several 
cases, however, scores , 60 and even 59 
and 58 were recorded as : 

In addition, scores thaf should have been re- 
corded as (65) were sometimes recorded as 65. 
This occurred on twenty-four papers onthe Bi- 


TABLE X 


NUMBERS AND PERCENTAGES OF ERRORS IN THE ADDITION OF 
POINTS ON THE REGENTS EXAMINATIONS IN BIOLOGY FOR 
JUNE, 1949 


Percentage 
of Brrors 


15.6% 
21.0% 
19.0% 
13.0% 
12.5% 
11.0% 
10.0% 
16.7% 
19.6% 
22.0% 
14.8% 
32.0% 
14.0% 
14.3% 
21.8% 
9.0% 

12.2% 
10. 6% 


fo} 


September, 1955) MALLINSON - BUCK 71 
Papers | Errors Papers | Errors of Errore 

63) 280 44 82 62 9.7% 

65 62 13 83 64 15.6% 

66 58 ll 84 64 12, 9% 

67 61 8 86 62 16.0% 

68 48 6 86 53 13.0% 

69 64 7 87 30 20.0% 
70 89 9 88 61 7.8% 

48 4 89 42 14.3% 

72 66 1s 90 42 9.5% : 
73 60 13 91 38 5.3% 

64 92 3A 21.0% 

75 100 32 93 32 16.6% 

76 87 | 22 9.8% 

ad 84 12 95 | 12 16.™% 

78 87 lg 96 12 16.7% 

79 67 6 97 13 15.4% 

80 62 10 98 7 

81 83 99 1 

100 


JOURNAL OF EXPERIMENTAL EDUCATION 


TABLE XI 


NUMBERS AND PERCENTAGES OF ERRORS IN THE ADDITION OF 
POINTS ON THE REGENTS EXAMINATIONS IN CHEMISTRY FOR 
JANUARY, 1950 


Percentage 
of Errors 


6.™% 
18.9% 
18.2% 
17.2% 
16.7% 
16.7% 
6.1% 
2.1% 
8.1% 
20.0% 
19.5% 
10.9% 
15.7% 
5.1% 
2.2% 
6.4% 
7.1% 
9.1% 


12 (Vol. 24 
Papers Papers | Errors of Brreve 
os 37 9.4% 
66 33 9.4% 
67 35 
36 4.4% 
69 36 
70 33 10.% 
n an 6.6% 
or 6.8% 
73 4.0% 
41 0.9% 
76 64 nal 
76 61 
77 69 2.0% 
18 1.4% 
79 47 
80 66 
81 55 


TABLE XU 


NUMBERS AND PERCENTAGES OF ERRORS IN THE ADDITION OF 
POINTS ON THE REGENTS EXAMINATIONS IN EARTH SCIENCE 
FOR JANUARY, 1950 


Percentage 
of Errors 


13.6% 
10.0% 
14.6% 
15.9% 
7.9% 
3.4% 
13.3% 
6.7% 
6.4% 
4.1% 


6 
7 
8 
8 
5 
9 
3 


September, 1955) MALLINSON - BUCK 73 

Totel | number Total 

65) | 163 26 15.9% 82 “4 

65 50 12 24.0% 83 0 

66 40 15.0% 84 41 

67 46 15. 2% 85 43 

68 51 15.™% 86 38 

69 62 12.9% 87 29 

70 47 10.7% 30 

50 18.0% 30 

72 37.5% 31 
73 1 0 91 22 

1% 0 0 92 14 7.1% 
@) | 158 16 10.1% 3 | 1s 6.™% 
15 62 15 24.2% 9 22.2% 
76 38 5 13.2% ee 9 0 

47 4 8.5% 7 14. 
78 43 6 13.9% “ed 2 60.0% 
79 46 6 13.5% 1 

80 47 3 6.4% 

| 66s 6 | ° 


JOURNAL OF EXPERIMENTAL EDUCATION 


TABLE XI 


NUMBERS AND PERCENTAGES OF ERRORS IN THE ADDITION OF 
POINTS ON THE REGENTS EXAMINATIONS IN PHYSICS FOR 
JANUARY, 1950 


“aS SS FSS 


ec @ 


74 (Vol. 24 
Total | Number Total | Number 
65) | 206 6.9% 82 - 9.2% 
65 n 15.6% 83 3 4.™% 
66 n 12.7% 84 3 4.1% 
P 67 64 6.2% 85 9 10.9% 
68 69 11.6% 86 1 1.5% 
; 69 48 16.™% 87 9 15.3% 
70 9.5% 88 4 6.9% 
71 60 10.0% 89 5 4.7% 
72 63 22.2% yo 2 3.3% 
73 70 8.6% 91 4 6.4% 
™ 66 13.6% 92 1 2.2% 
75 88 10.4% 93 1 2.1% 
76 n 12. ™% 
7 56 3.6% 06 1 4.2% 
78 75 9.3% 1 2.0% 
19 55 3.6% 97 0 
80 | 103 7.08% 
100 


September, 1955) 


ology Examination for January 1950. 

e) Counting scores of 72, 73, and 74 as (3) 
In some cases, such as in schools that do not 
offer laboratory work, and in ‘‘short-term’’ 
courses (veterans’ classes) a score of 75 is re- 
quired for a passing grade. In many cases 
teachers counted 72, or 74 as (8, similar 
to counting 62-64 as (69). While is nota 
legal procedure, the State has accepted it in 
the past, considering such scores as falling 
within a scoring-error range similar to the 62- 
65 range. However, ina number of cases in 
which 75 was not required for passing, the 
teachers still recorded 72, 73 or 74 as 

3. Errors and Inconsistencies in CorFéction. 
——The summaries made by the investigators of 
scoring errors revealed several types of errors 
that could be classified as being actual mistakes 
or inconsistencies in scoring the papers. Ex- 
amples of such errors follow: 

a) Giving credit for parts of an item that 
Should have been rejected. As indicated prev- 
iously, many Part II items contained seven or 
eight parts, of which the student was to answer 
five or six and omit the remainder. Sometimes 
a scorer would fail to note that all parts of such 
an item had been answered by a student, would 
give credit for all parts, and hence award more 
than the maximum allowable number of points 
to the student. Suchanerror occurred in 
twenty-two out of 2441 Biology Examinations 
for January 1950. 

b) Giving credit for an entire item that should 
have been omitted. Anerror similar to (a) 
above occurred when a student had the optionof 
rejecting one or more entire items in Part II. 
Occasionally a student would answer all the it- 
ems, and again the scorer would correct all of 
them, giving the student extra points. 

c) Failure to note the omission of entire it- 
ems or parts of items. A condition just the re- 
verse of cases (a) and (b) above occurred when 
a teacher did not notice that a student had omit- 
ted an entire item or part ofan item. This 
type of error occurred quite easily if the teach- 
er was scoring the paper by deducting points 
(rather than recording the number of pos itive 
points awarded, as suggested by the State ). 
The teacher would simply add the number of 
points deducted and subtract from the maximum 
number of points allotted the item, failing to 
note the omission. Had the paper been scored 
by the positive-point method, such an error 
would not have resulted. Errors of this type 
were found on twenty out of 1699 papers in 
Earth Science for June 1949. 

d) Failure to correct an item or part of an 
item. Occasionally a teacher would overlook 
an entire item or part of anitem. In sucha 
case the student did not receive as high ascore 
as he deserved. This type of error was detect- 


MALLINSON - BUCK 


75 


ed on fifty-nine out of 2441 Biology Examinations 
for January 1950. 

e) Errors in awarding points. The items on 
Part II of the examinations consist of from two 
to eight or nine parts. The maximum point 
value of the items is in each case ten, How- 
ever, the values of the parts vary from item to 
item, depending on the number of parts inthe it- 
em or the complexity of the part. For example, 
one item might consist of three parts of values 
five, three, and two respectively; while an- 
other item might consist of five parts of two val- 
ues each. In many cases the teachers awarded 
incorrect numbers of points for items, namely, 
parts were given three points credit, when the 
maximum value was only two. 

The opposite situation, the awarding of too 
few points, was difficult to detect. However, 
one bundle of sixty Physics papers was ta | lied 
on which the teacher did not give full credit 
(three points) in any case. It is difficultto be- 
lieve that all sixty students failed to answer the 
item correctly. Hence it seems reasonable to 
assume that the teacher thought that the maxi- 
mum value of the item was two, rather than 
three. 

f) Inconsistencies in scoring. Many inconsis~ 
tencies in scoring were noted, not only within 
the work of a single teacher, but also between 
the scoring procedures of different teachers. 
An answer that would receive full credit from 
one teacher might receive only partial credit or 
no credit from another. Such situations were 
expecially obvious on items that required draw- 
ings or diagrams. Often one teacher would ap- 
parently give more credit for neat, artistic work, 
while another would consider only the scientific 
accuracy of the drawing. 

Similar inconsistencies occurred even with- 
in the work of one teacher. For example, one 
Physics Examination included an item involving 
unites of electricity. One teacher gave credit 
for an answer of 2520 watts, but consistently 
marked as incorrect answers of 2. 52 kilowatts. 
Since the students were given no instructions in 
the item as to the units to be used, it would ap- 
pear that both answers should be considered 
equally correct. 

g) Obvious errors in scoring. In many cases 
errors were detected (1) where a correct an- 
swer was consistently scored by a teacher as 
being wrong, and (2) where obviously incorrect 
answers were marked correct. It may be as - 
sumed that in such cases the teacher sim ply 
did not know the correct answer to the item. 

4. Miscellaneous Errors in Scoring. —Cer- 
tain types of errors appeared that were difficult 
to classify under any of the previous categories. 
Hence they were grouped under this heading. 
They are as follows: 

a) Errors in transposition. In most cases 


76 JOURNAL OF EXPERIMENTAL EDUCATION 


the examination papers of the individual stu- 
dents are stapled to a cover page on which the 
teacher lists the total Part I score, the scores 
awarded on the individual Part IJ items, and 
the Part II total score. These are then totaled 
on this cover sheet to show the student's final 
score on the entire examination, 

Many times errors were made in the trans- 
fer of these partial scores to the cover page. 
For example, such errors occurred on 118 out 
of 1974 papers for the Chemistry Examination 
for Tune 1949. 

It is difficult to determine whether these er- 
rors are due to the carelessness of the scorer, 
or whether they are intentional. However, it 
is interesting to note that most of these errors 
accrued to the benefit of the student, thatis, the 
score on the cover was higher than the actual 
‘core, Occasionally, however, a scorer would 
fail to record the score obtained on an entire it~ 
em on Part Il. Thus the student receiveda low- 
er score than he earned, 

b) Obvious ‘‘upgrading’’ of scores. In some 
cases it was quite evident that the scorer had 
remarked a paper to give a studenta higher 
grade than he earned or had simply changed the 
total grade to bring it up to a passing score. 
Such cases are difficult to construe as anything 
but dishonesty on the part of the scorer. For 
example, twenty-two such papers were identi- 
fied in the Chemistry Examination for June 1949. 

c) Failure to grade a paper completely. A 
situation somewhat related to (b) was anobvious- 
ly dishonest practice that was detected on afew 
occasions. In several instances papers were 
found in which the scorer had corrected Part I 
of the paper only. The scorer then simply 
credited the paper with a sufficient number of 
points on Part I to bring the total up to the pass- 
ing level. In some cases the Part I score was 
merely doubled, or if this did not give the paper 
a passing score, a sufficient number of addi- 
tional points was added, It was obvious that 
the items on Part II had not even been read in 
some of these cases. 


Recommendations 


As a result of these findings the following 
recommendations seem reasonable: 


i. It is recommended that the State of New 
York provide a more specific list of instructions 
for the scoring procedures to be followedincor- 
recting the Regents Examinations in Science. 

2. It is recommended that a scoring key be 
provided for Part II of the examinations, as 
well as for Part Il. It is realized that because 
of the nature of the Part II items, itis diffi- 
cult to construct an absolute scoring key. How- 
ever, it would seem desirable to prepare a 


(Vol. 24 


‘scoring guide’’ that would suggest point values 
for whole or partial answers. 

3. It is recommended strongly that the State 
continue to spot-check the examination papers 
in an attempt to identify scoring errors and in- 
consistencies, and that, further, they notify 
the science supervisor or administrators in the 
schools in which the errors most frequently oc- 
cur of their type and extent. This may reduce 
the appearance of these errors on future exam- 
inations. 


SECTION VI 


AN ANALYSIS OF THE SCORES OBTAINED ON 
THE TEST ITEMS ON THE REGENTS EX- 
AMINATIONS IN SCIENCE 


The Problem 


THE PROBLEM of this phase of the in- 
vestigation is to analyze the individual items on 
the sixteen Regents Examinations in Science that 
were studied, with respect to these points: 


1. The degree of difficulty of the various 
types of items 

2. The discriminating power of the items 

3. The popularity of certain items 


Methods Employed 


In order to determine the degree of difficulty 
and the discriminating power of the individual it- 
ems of the examinations, the average or per- 
centage score obtained on each item was tabulat- 
ed as described below. For Parts I of the tests 
(each of which are composed of fifty short- 
answer type items of unitary value) the average 
score for each item was_determined for each of 
the total score groups ((€5) through 100) by divid- 
ing the number of student answering the item 
correctly by the total number of students receiv- 
ing the respective score. 

As has been stated, Parts II of all the exam- 
inations are composed of eight or nine essay - 
type items each of which bears a total value of 
ten points. Of these, the student may select 
any five. Each essay item consists of from 
two to ten parts of varying point values. Tocal- 
culate the percentage scores on these parts, the 
total number of points earned by all the students 
answering the part was divided by the maximum 
number of points that they could have obtained 
had they all answered it correctly. This was 
done fgr each item for each total score group 
from (5) through 100. 


September, 1955) 


Degrees of Difficulty of the Items 


The average and percentage scores thus ob- 
tained were analyzed to determine the degree 
of difficulty of each item on each of the sixteen 
examinations. It is obvious that if the average 
or percentage score of any item was consistent- 
ly low for all the total score groups, the item 
must be difficult. Conversely, if the score was 
consistently high, the item must be easy. The 
items were categorized arbitrarilyas being 
‘‘easy,’’ ‘‘of average difficulty, ’’ or ‘‘difficult’’ 
on the basis of the following criteria: (1) if ap- 
proximately one-half or more of the average or 
percentage scores were above .90, the item 
was considered easy; (2) if approximately one- 
third or more were below .50, the item was 
considered difficult; and (3) those falling between 
these ranges were considered to be of average 
difficulty. 

The items thus categorized were then stud- 
ied in an effort to determine whether any one 
type seemed to be consistently easy or difficult. 
As a result of this analysis, a list was made 
that included the general subject-matter areas 
containing the largest numbers of difficult items 


on the examinations in each of the science fields. 


A few examples of each type are cited below: 


Difficult Items 


I. Biology 
A. Plant and Animal Phyiology 
1. ‘‘What are two adaptations ofa root 
hair that help it to perform its func - 
tions?’’ (January 1950, Part II, 1 ag) 


. ‘‘Name a plant tissue cell. State its 
special function and describe how it is 
fitted to perform this function. ’’ (June 
1950, Part II, 7 b) 


. ‘*The mouth waters when food is pres- 
ent because sali ary glands have been 
Stimulated by neurons. (Jan- 
vary 1949, Part I, 34) 


B. Genetics and Heredity 
1. ‘*To the species involved, mutations 
are (1) always harmful, (2) always use- 
ful, (3) usually harmful, (4) usually 
useful. ’’ (January 1950, Part I, 13) 


. “Give an explanation for the following 
true statement(s): In some cases the 
marriage of first cousins results in 
very desirable offspring; in other 
cases very undesirable offspring re - 
sult. '’ (January 1950, Part II, 3 f) 

. ‘‘An animal has four chromosomes in 
each body cell. State the number of 


MALLINSON - BUCK 77 


chromosomes in (1)aprimary egg cell."’ 
(January 1950, Part II, 5 by) 


C. General Terminology 
i. ‘‘The process of boiling milk to kill all 
bacteria is sterilization.’ (January, 
1950, Partl, 50) 


. ‘*The part of the seed that will develop 
into the plant is the .*’ (Jan- 
uary 1949, Part I, 38) 


3. ‘‘An example of an antibiotic is sulfadi- 
azine. ’’ (June 1950, Part I, 25) 


There were also moderate numbers of diffi- 
cult items in the general categories of compar- 
ative anatomy, bio-chemistry, and history of 
biology. 


ll. Chemistry 
A. Organic Chemistry 
1. ‘‘Hard coal consists chiefly of (1) carbo- 
hydrates, (2) combined carbon (3) un- 
combined carbon, (4) hydrocarbons.’’ 
(January 1950, Part I, 17) 


. “Describe one method of making methy! 
alcohol.’ (January 1950, Part II, 6 


. “Write the structural (graphic) for m- 
ula for (1) chloroform, (2) ethylene.’’ 
(June 1949, Part II, 6 b) 


B. Atomic Weights 
1. ‘*The weight of 22. 4 liters of hydrogen 
is approximately (1) 0.09 grams (2) 2 
grams, (3) 1 gram, (4) 22,4 grams. ’’ 
(January 1949, Part I, 33) 


. The weight of nitrogen compared with 
an equal volume of air is approximate- 
ly (1) one-half as great, (2) the same, 
(3) twice as great, (4) fourteen times as 
great.’’ (June 1950, Part I, 47) 


C. Commercial Reactions 
1. ‘‘Charcoal is a product of the process 
that also produces (1) acetic acid (2) 
coal tar, (3) coke, (4) gasoline. ’’ (Jan- 
uary 1950, PartI, 14) 


‘*The reaction of carbon monoxide and 

hydrogen is used commercially to make 
(1) carbonic acid, (2) chlorine, (3) meth- 
anol, (4) soap. "’ (June 1950, PartI, 27) 


‘‘Name two products which are obtained 
from coal tar."’ ‘State one use for 


each product mentioned inc.'’ (June 
1949, Part II, 6 c,d) 


JOURNAL OF EXPERIMENTAL EDUCATION 


4. ‘‘What substance may be treated with 


chlorine to manufacture bleaching 
powder?’’ (June 1950, Part I, 3 e) 


D. Laboratory Procedures and Techniques 
1. ‘‘If too much air is allowed in the fuel 


mixture, the Bunsen flame will (1) be- 
come colorless, (2) become yellow, 
(3) deposit soot, (4) strike back. ’’ 
(June 1949, Part I, 28) 


. “Give the reagents used in the labora- 


tory preparation of (1) nitric acid (2) 
ammonia. (June 1949, PartII, 8 d) 


. ‘State briefly how to prepare hydro- 


gen from water and sodium chloride. '’ 
(June 1950, Part U1, 5 


E. Equations 


‘‘Write a completely balanced equation 
for the reaction between copper and 
hot concentrated sulfuric acid, ’’ (June 
1950, Part Ul, 1 e) 


(Vol. 24 


ture increases. ’’ (January 1949, Part 
ll, 2c, d) 


. ‘Air descending the side of a mountain 


becomes compressed. Why does this 
make the air comparatively dry?’’ (Jan- 
uary 1950, Part Il, 6 c) 


. “Barometric pressure recorded on a 


weather-bureau station model as 247 
would be read (1) 924.7, (2) 1002. 47, 
(3) 1024. 7, (4) 1247 millibars.’’ (June 
1950, Part I, 23) 


C. Astronomy 


1. 


‘*The planet which is about the same 
size as the earth is (1) Mars, (2) Ven- 
us, (3) Mercury, (4) Uranus. ’’ (June 
1949, Part I, 29) 


. ‘‘Explain the following: 


In New York State the altitude of the 
noon sun is higher during the summer 
than it is during the winter. '’ (January 


1949, Part Il, 5 b) 

. “Write an ionic equation to show what 
happens when an oxygen ion is convert- 
ed to an O, atom.’ (June 1949, PartII, 
4c) 1. 


IV. Physics 

A. Sound 
‘State two conditions under which two 
sound waves of the same amplitude will 
produce complete interference. ’’ (Jan- 
uary 1950, Part II, 6 c; and June 1950, 
Part Il, 4d) 


Other difficult items include those involving 
terminology, characteristics of elements and 
compounds, and everyday applications of chem- 


istry. 
. ‘*The note produced by a string vibrat- 


ing as a whole is called a (an) overtone.’ 
(January 1949, Part I, 38) 


Ill. Earth Science 
A. Geology (this category included by far the 
largest number of difficult items) 
1. ‘‘Headwater erosion of a valley glacier 
results in the formation ofa (an) 


. ‘Find the fundamental frequency in 
vps. of a note produced by a whistle, 
."? (January 1950, Part I, closed at one end, if the length of the 
2) air column is six inches. Air temper- 
ature is 20 degrees, C.’’ (June 1950, 
‘‘Explain the following true state- Part ll, 4c) 
ment(s): The Catskill Mountains are 
classified as a plateau region. "' (Jan- 
vary 1950, Part I, 5 d) 


B. Electricity 
1. ‘‘During the discharging process of a 
lead storage cell, the amount of water 


in the cell 
Part lI, 44) 


. “Explain how weathered rock may ."’ (June 1949, 


again become bedrock.’'’ (January 


1949, Part I, 1 d) 
. “An electric heater has two coils with 


resistances of 40 ohms and 60 ohms. 
The heater operates on a 120-volt cir- 
cuit. It is equipped with a switch that 
allows either coil to operate in series.’’ 
‘In which of the three possible operat- 
ing circuits is the heat developed the 
greatest?" (January 1950, PartII, 4d) 


. ‘An intrusion of igneous rock that cuts 
across the rock layers is called a (1) 
dike, (2) fault, (3) laccolith, (4) sill.’’ 
(June 1950, Part I, 31) 


B. Weather 
1. ‘‘Distinguish between absolute and rel- 
ative humidity. '’ ‘‘Explain why rela- 
tive humidity decreases as tempera- 


3. ‘‘An iron wire has more resistance 


78 
2 
3 


September, 1955) 


than a copper wire of the same dimen- 
sions, and an aluminum wire has more 
resistance than the copper wire of the 
same dimensions. Compare the cur- 
rent in the three wires and state in 
which wire the most heat is generated 
when they are connected to a battery 
(i) in series; (2) in parallel. ’’ (June 
1950, Part Il, 5 b) 


C. Lenses and Mirrors 
1. ‘‘The image of an object viewed 
through a concave lens is always erect 
and larger than the object.’’ (June 
1949, Part I, 26) 


. ‘A woman sees a full-length image of 
herself in an upright plane mirror. 
The. minimum length of the mirror is 
(1) exactly the same as, (2) one-half, 
(3) twice, (4) independent of the height 
of the woman."’ (January 1950, Part 
I, 12) 


Other physics items that seemed difficult 
occurred in the areas of heat and mechanics. 

It is interesting to note that the total number 
of items categorized as being ‘‘easy’’ far out- 
numbered those categorized as ‘‘difficult.’’ 
This, of course, is explained partly by the fact 
that the examinations analyzed all received pas- 
Sing scores between and 100. Hence, the 
examination items would naturally consist 
chiefly of those receiving high average and per- 
centage scores. 


Easy Items 


I. Biology 
A. Conservation 
1. ‘‘Contour plowing is done in an effort 
to (1) beautify the farm, (2) control 
weeds, (3) discourage insects, (4) 
- topsoil.’’ (January 1949, Part I, 
6 


. ‘*Explain the relationship of forests to 
each of the following: (1) flood control, 
(2) prevention of erosion, (3) preser- 
vation of wildlife.’’ (January 1950, 
Part II, 1 b) 


B. Plant and Animal Physiology 
1. ‘‘In mammals the body wastes are ex- 
creted by the lungs, skin and (1) kid- 
neys, (2) pancreas, (3) small intestine, 
(4) stomach.’’ (January 1949, Part I, 
21) 


2. ‘*The type of cell in the bloodstream 
that increases in number in response 


MALLINSON - BUCK 


to the invasion of bacteria is the 
.’’ (June 1950, Part I, 36) 


3. ‘‘State four life functions carriedon by 
a maple tree."’ (January 1949, Part LI, 
la) 


C. Reproduction and Genetics 
1. ‘‘An important function of the sperm 
cell is to supply the egg with (1) a set 
of genes, (2) extra cytoplasm (3) extra 
food, (4) important hormones. '’ (Janu- 
ary 1949, PartlI, 7) 


. ‘The union of two unlike sex cells is 
called (1) fertilization, (2) maturation, 
(3) parthenogenesis, (4) vegetative prop- 
agation.’’ (January 1950, Part I 3) 


. ‘Using a keyed and labelled diagram, 
show the cross between long and long 
radishes.’’ (January 1950, Part ll, 9 


D. Everyday Applications of Biology 
1. ‘‘It is now possible to keep an area 
quite free from flies by the use of (1) 
2-4D, (2) DDT, (3) Streptomycin, (4) 
sulfa drugs.’’ (January 1949, Partl. 5) 


. ‘State the principal health purpose of 
each of two of the following: chest x- 
rays; Wassermann test; pasteurization 
of milk.’’ (January 1949, Part Il, 6 c) 


Il. Chemistry 
A. Chemical and Physical Properties of El- 
ements and Compounds 
1. ‘‘Hydrogen sulfide is most easily rec- 
ognized by its (1) color, (2) density, 
(3) odor, (4) state.’’ (January 1949, 
Part I, 1) 


2. ‘The lightest of the following gases is: 
(1) NH,, (2) NO, (3) N,O, (4) NO,."’ 
(June 1949, Part I, 1) 


B. Chemical Reactions 
1. ‘‘The solution resulting from the reac- 
tion between sodium and water contains 
(1) an acid, (2) an anhydride, (3)a base, 
(4) a salt.’’ (January 1949, Part I, 10) 


2. ‘*The reaction of a carbonate with an 
acid yields (1) carbon dioxide, (2) car- 
bon monoxide, (3) hydrogen, (4) oxy- 
gen.’’ (June 1949, Part I, 16) 


3. ‘‘Give three reasons why a chemical 
reaction may go to completion. '’ (Jan- 
uary 1950, Part I, 4 d) 


C. Everyday Applications of Chemistry 
1. ‘*The growth of a legume, suchas clo- 
ver, adds to the soil a compound of 
(1) nitrogen, (2) phosphorous, (3) po- 
tassium, (4) sulfur.’ (January 1949, 
Part i, 22) 


2. ‘‘Goiter may be caused by a diet defic- 
ient in (1) bromine, (2) chlorine, (3) 
flourine, (4) iodine. ’’ (January 1950, 
Part I, 25) 


D. Laboratory Procedures 
1. ‘*To prepare bromine inthe laboratory, 
add sulfuric acid to (1) NaBr, (2) NaBr 
and MnO,, (3) NaCl and Na Br, (4) 
MnBr,."’ (June 1949, Part I, 37) 


2. “A catalyst used in a preparation of 
oxygen is (1) manganese dioxide, (2) 
mercuric oxide, (3) potassium chlo- 
rate, (4) potassium chloride.’’ (Jan- 
uary 1950, Part I, 33) 


3. ‘‘Draw a diagram of the apparatus 
used in preparing and collecting am - 
monia in the laboratory. "’ (January 
1949, Part I, 5 c) 


' Other easy items in Chemistry included 
those involving the writing of balanced equa - 
tions, and the knowledge of symbols and form- 
ulae. 


Ill. Earth Science 
A. Geology 
1. ‘*Physical and chemical action on ex- 
posed rock surfaces by atmospheric 
agencies is called (1) erosion, (2) cor- 
rosion, (3) suspension, (4) weathering.’’ 
(June 1949, Part I, 23) 


2. ‘*The breaking of minerals in sucha 
way that smooth plane surfaces are 
produced is known as (1) cleavage, (2) 
fracture, (3) luster, (4) streak. '’ (Jan- 
uary 1949, Part I, 17) 


3. ‘The peeling or splitting-off of outer 
layers of rock due to temperature 
changes is called (1) cleavage, (2) ex- 
foliation, (3) faulting, (4) fracture. "’ 
(June 1950, Part I, 35) 


B. Weather 
1. ‘‘Closely spaced isobars on a weather 
map indicate winds.”’ (Jan- 


uary 1949, Part I, 8) 


2. ‘‘When the air is completely saturated 
with moisture, the relative humid- 


80 JOURNAL OF EXPERIMENTAL EDUCATION 


( Vol, 24 


ity is zero (June 1949, PartlI, 38) 


3. ‘State two characteristics of weather 
that an mT air mass will bring to New 
York State.’’ (June 1950, PartII, 2 b) 


4. ‘Distinguish between weather and cli- 
mate.’’? (January 1950, Part I, 6 a) 


IV. Physics 
A. Electricity and Magnetism 


1. ‘‘The filament now used in most elec- 
tric lamps is made of a 


(January 1950, Part I, 24) 


2. ‘‘When the south pole of a magnet is 
brought near the head of an iron nail, 
the head of the nail becomes a south 
pole.’’ (June 1950, Part I, 33) 


3. ‘‘A step-up transformer used to oper- 
ate a neon sign has a turn ratio of 1:100. 
The primary voltage is 110 volts. The 
primary current is 10 amperes. The 
secondary current is .09 ampere. Find 
the (b) wattage of the primary; (c) wat- 
tage of the secondary.’’ (January 1949, 
Part Il, 6 b, c) 


B. Mechanics 


1. ‘*The moment of a 20-pound force push- 
ing perpendicularly on a lever five feet 
from the fulcrum is pound- 
feet.’’ (June 1949, Part I, 20) 


2. ‘*The theoretical mechanical advantage 
of a wheel and axle is 6. The wheel 
diameter is 12 inches. The axle diam- 
eter is inches.’’ (January 1950, 
Part I, 20) 


3. ‘A 500 pound weight is drawn up an in- 
clined plane 15 feet long and 3 feet high. 
The effort required is 125 pounds. Find 
the actual mechanical advantage. ’’ 
(June 1950, Part II, 3 a) 


C. Density 


1. ‘‘As a liquid contracts, its density 
."’ (June 1949, Part I, 37) 


2. ‘*Two solids show equal apparent losses 
of weight when submerged in water. 
Their densities must be equai.’’ (June 
1950, Part I, 38) 


3. ‘*The apparent weight of 3 cubic feet of 
metal submerged in water is 375 pounds. 
(Density of water is 62.5 pounds per cu- 
bic foot). Find the (a) volume of water 
displaced; (b) weight of water displaced; 


September, 1955) 


(c) weight of metal in air.’’ (January 
1949, Part Il, 1a, b, c) 


Other easy items in Physics included many 
in the areas of light, sound, and the use of sci- 
entific instruments. 


Following the analysis of the difficulty of the 
items as a function of subject-matter, an analy- 
Sis was made to determine the relationship be- 
tween the forms in which the items were writ- 
ten and their degrees of difficulty. 

Parts I of all the examinations are composed 
of short-answer type items such as the multiple 
choice, modified true-false, and completion 
types. Parts II of the examinations are more 
Subjective in nature, and include essay-type 
items that require more explanations and de- 
scriptions; the drawing or interpretation of dia- 
grams; mathematical problems; and the writing 
of equations. 

On the Biology Examinations, it was found 
that the largest percentage of the difficult items 
on Part I were of the completion type, while the 
easiest were the multiple-choice. On the Part 
II items, no particular type of item seemed eas- 
ier or more difficult than the others. 

The only type of short-answer type of item 
found on the Chemistry Examinations was the 
multiple-choice type. Hence, no comparison 
could be made. On Part Il, however, the high- 
est percent of the difficult items included those 
that required explanations, and those that de - 
manded the writing of equations. 

On Part I of the Earth Science Examinations, 
the modified true-false and the completion it - 
ems seemed to be about the easiest for the stu- 
dents; while the multiple-choice seemed to be 
the most difficult. On Part II the percentage of 
difficult essay items was high, but among the 
easy items were those that required drawings 
or the interpretation of diagrams. 

On Parts I of the Physics Examinations, the 
three types of short-answer items were of ap- 
proximately equal difficulty. On Parts II, how- 
ever, the number of easy mathematical items 
was high. A number of mathematical items 
seemed to be difficult, also. 

An overall comparison of the items on Parts 
I and IJ of all the examinations indicates that 
there is a substantially higher percentage of dif- 
ficult items on Part [I] than on PartI. Hence, 
it appears that in general, the short-answer 
type items are less difficult for most students 
than are the ‘‘essay-type.’’ However, the data 
just summarized fail to reveal any consistent 
trends concerning the degrees of difficulty of 
the various types of items. Hence, the specif- 
ic form in which the item is written does not 
generally appear to be a significant factor inits 
degree of difficulty. 


MALLINSON - BUCK 


Summary 


From an analysis of the degrees of difficulty 
of the various examination items, a few general- 
izations may be made; 


1. It appears that items involving current 
science information (such as antibiotics, the hy~ 
drogen bomb, etc.,) seem to be more difficult 
than other types. A possible explanation may 
be the fact that many textbooks are not up~to- 
date. 

2. Many of the difficult items are also ambig- 
uous, and hence present difficulty for the stu- 
dent in answering, as well as problems of scor- 
ing for the teacher. 

3. Items involving the use of scientific atti- 
tudes, applications of knowledge, and the use of 
elements of scientific method are in general 
more difficult than the factual type. 

4. Essay-type items are generally more dif- 
ficult than the short-answer items. 


Discriminating Power of the Examination Items 


In general, there are two different views with 
respect to the concept of discriminating power 
of an examination item: 


1. If an item is designed to measure one pre- 
cise objective, it will ideally cut the examination 
group into two sections, namely, those students 
above a given total score who answer the item 
correctly, and those below the given total score 
who answer the item incorrectly. If the average 
scores for such an ideal item were plotted, the 
resulting histogram would have the following 
configuration: 


average score of item 


° “ 60 
total test score 


2. If an item is designed to measure a gen- 
eralized objective, or a multiple set of objec- 
tives; or if the item is used to test a group of 
individuals whose quality and quantity of train- 
ing with respect to the objectives differ, then 


an analysis of the average scores of the item 
would ideally rise gradually as the total scores 
of the group increase. A graph similar to the 
following would result: 


vo eo wo 


total test score 


The New York Regents Examinations in Sci- 
ence are designed to measure a multiple set of 
objectives. In addition, although guided by a 
State Syllabus, no teacher is required to pre- 
sent a given course of study. Hence, no two 
teachers are likely to teach the same kind or 
amount of science material. Therefore no two 
groups of students who take the examinations 
are likely to have the same training. For this 
reason, the discriminating power of the exam- 
ination items needs to be evaluated in terms of 
the second viewpoint described above. 

Thus, the average scores of the items on 
Parts I of the examinations and the percentage 
scores of the items on Parts II were analyzed 
to determine whether they increased as the to- 
tal scores increased from to 100. The fol- 
lowing criteria were estab for use in this 
study, as a measure of discriminating power: 


1. An item was considered to have excellent 
discriminating power, if (after the initial in- 
crease) its average score, in seventy-five per- 
cent of the cases, increased consistently (a one 
to ten percent increase) with each one point in- 
crease in the total examination score. 

2. An item was considered to have moderate 
or ‘‘average’’ discriminating power if the in- 
crease in twenty-five percent or more of its 
average or percentage scores, fluctuated be- 
tween ten and twenty-five percent. 

3. An item was considered to be a poor dis- 
criminator if the increase in twenty-five per - 
cent or more of its average or percentage 
scores fluctuated by more than twenty-five per- 
cent. 

4. All items that had consistently low or con- 
sistently high average or percentage scores 
were also considered to have poor discriminat- 


ing power. 


JOURNAL OF EXPERIMENTAI, EDUCATION 


Results 


Table XIV summarizes the findings of the an- 
alysis for the discriminating power of the exam- 
ination items. 

Table XIV indicates that on the Biology Ex- 
aminations, the greatest number (approximate- 
ly sixty percent) of the items was found to have 
average discriminating power, while only about 
four percent were found to be excellent discrim- 
inators. Of the poor discriminators, four per- 
cent were so classified because of the great 
fluctuation of their average scores. Approxi- 
mately thirty percent were considered poor be- 
cause they were easy, and two percent because 
they were difficult. It is interesting to note that 
of the easy items the vast majority were found 
on Part L. 

There were only three (about one per cent) 
of the Chemistry items that could be classified 
as excellent discriminators on the basis of the 
criteria outlined above. Again, the majority of 
the Chemistry items (about forty-three percent) 
were found to be of average discriminating pow- 
er. Of the poor discriminators, twenty-four 
percent were so classified because their aver- 
age scores fluctuated greatly; about thirty per- 
cent were easy, and about two percent difficult. 
Again, the majority of the easy items appeared 
on Part I of the examinations. 

On the Earth Science Examinations, about 
two percent of the items were considered as be- 
ing excellent discriminators while approximate- 
ly fifty-two percent, average. Twelve percent 
were considered poor because their average 
scores fluctuated greatly, about thirty-one per- 
cent because they were easy, and about one per- 
cent because they were difficult. 

Of the items on the Physics Examinations, 
three percent were found to have excellent dis- 
criminating power, while fifty-three percent 
were found to be average discriminators. Of the 
poor discriminators, nineteen percent were so 
categorized because their average scores fluc- 
tuated greatly, twenty-three percent because 
they were easy, and one percent because they 
were difficult. 

Items in the following areas were classified 
as having poor discriminating power: 


l. Biology 
1. ‘‘A plant embryo with a food supply anda 
protective coat is called (1) a fruit, (2) a 
seed, (3) an embryo sac, (4) an ovule. ’’ 
(January 1950, Part I, 4) 


2. ‘Tell whether each of the following is 
true or false and give your reasons. ”’ 
‘*Poison ivy can be destroyed by pouring 


September, 1955) MALLINSON - BUCK 


TABLE XIV 


PERCENTAGES (APPROXIMATE) OF ITEMS OF EXCELLENT, 
AVERAGE AND POOR DISCRIMINATING POWER 


excellent Averege Poor 
Type of Test Discriminating Discriminating Discriminating 
rower Power Power 


Biology 


Chemistry 


earth Science 


Physics 


TABLE XV 


THE POPULARITY OF THE ITEMS ON PARTS II OF THE SIXTEEN 
REGENTS EXAMINATIONS 


“xamination Unpopular |)'edium-Low Popular 


Biology, Jan. 
June 
Jan. 
June 


Chemistry, Jan. 
June 
Jar . 
June 


Barth Seience, Jan. 
June 
Jan. 
June 


Physics, Jan. 
June 
Jan. 
June 


= 
83 
x 54% 
1949 5 1,2,3,4,7 6,8 9 
1949 1,2,5,6,7 4 4,9 
1950 1,2,5,6,7 3,4 
1950 6 1 5,7,8,9 3,4 2 
1949 7 6 4,8 1,2,3,5 
1949 6 8 3,7 1,2,4,5 
1950 ” 7 6 5 1,2,5,4 
1950 7 a 4,6 3,5 1,2 
1949 2,4,6 5,7 1,5,8 
194 5 2,4,7 6 1,3,8 
1950 4,5,6,7,8 1,2,3 
1950 1,3,7,8 4,6 2,5 
1949 a 1 2,5,6 
194 8 3,5,6,7 4 1,2 
1950 7 6,8 1,2,3,4,5 
1950 7 1,4,6 2,3,5 


84 


salt water 
Part 2 


Il. Chemistry 


Part I, 9) 


IV. Physics 
1. 


I. Biology 


extensive 


II, Chemistry 


48) 


IV. Physics 


alcohol, 


The following are examples of items show- 
ing excellent discriminating power: 


JOURNAL OF EXPERIMENTAL EDUCATION (Vol. 24 


on its roots.” (June 1949, 
by) (Gee graph I) 


1. ‘‘The reaction of the proposed hydrogen 
bomb involves a change of hydrogen to 
(1) argon, (2) radium, (3) helium, (4) ur- 
anium.'’ (June 1950, Part I, 50) 


2. ‘‘Describe how to make acetylene. "’ 
(June 1950, Part I, 7 a) 


Ill. Earth Science 
1. ‘‘Feldspar may change to when 
acted on by moist air.’’ (January 1950, 


2. ‘‘Explain why relative humidity decreases 
as temperature increases.’’ (January 
1949, Part II, 2 d) 


‘‘A balloon will rise until it displaces its 
own weight of air.'’ (True-false) (June 


1950, Part I, 31) 


2. ‘*The diagrams (of saxaphone and violin 
sounds) represent the wave patterns of 
the same note sounded on two different in- 
struments. State one respect inwhichthe 
sounds are similar. State one respectin 
which the two sounds are different.’’ (Jan- 
uary 1949, Part II, 7 d) 


1. ‘‘A tissue whose function is aided by the 


branching of its cells is (1) 


blood, (2) epithelium, (3) nerve, 
(4) smooth muscle.'’ (January 1949, Part 
I, 14) (See graph I) 


1, ‘Isotopes of uranium have different (1) 
atomic numbers, (2) atomic weights, (3) 
numbers of planetary electrons, (4) num- 
bers of protons.’’ (June 1950, Part I, 


Earth Science 
1. ‘*The material deposited by a stream at 
the base of a mountain forms a (an) 
(January 1950, PartI, 17) 


1. ‘‘A bottle can hold 120 grams of water. 
The same bottle can hold 96 grams of 


The volume of the bottle is 
cu. cm, The specific gravity 


of the alcohol is .”’ (June 1949, 
PartI, 18, 19) 


Summary 


The following generalizations may be made 
relative to the discriminating power of the items: 


1. Few of the items on any of the examina- 
tions could be considered as having excellent 
discriminating power. 

2. The greatest percentage of the items were 
classified as average or poor discriminators. 

3. There was an extremely small percentage 
of items showing consistently low average scores, 
while a large number of items had consistently 
high average or percentage scores. Of these 
latter, the majority were items on Part IL. 


Popularity of Items 


Since a student has an opportunity to choose 
five out of eight or nine items on Part Il of the 
examinations, it was decided toanalyze the items 
with respect to their ‘‘popularity’’ with the stu- 
dents. To do this, the percentages of persons 
electing the various items were determined by 
dividing the number of students choosing an it- 
em by the total number of students obtaining a 
particular total score. This was done for each 
of the total score groups from to 100. The 
percentages were then categoriZéd on the basis 
of the following criteria; 10 


1. If the percentage was thirty or below, the 
item was considered to be ‘‘unpopular’’ with a 
single score group. 

2. If the percentage was seventy or above, 
the item was considered to be ‘‘popular’’ witha 
Single score group. 

3. Items whose percentages ranged between 
thirty and seventy were considered to be of aver- 
age popularity with a single score group. 


Based on these criteria, the popularity of 
each @ was tabulated for each total score 


from (€9 to 100. These tabulations were then 
grouped as follows: 

1. If the item was popular with seventy-five 
percent or more of the score groups, it was list- 
ed as popular. 

2. If the item was unpopular with seventy - 
five percent or more of the score groups, it was 
listed as unpopular. 

3. If the item was of average popularity with 
seventy-five percent or more of the scores 
groups, it was listed as being of average popu- 
larity. 

4. If there were approximately equal num - 
bers of items in both the unpopular and average 


3803 TER 


UO 61008 


MOd NOLLVNIWIYOSIA HOOd) 
ANNE ‘NOLLVNINVXS ADO'IOIA ‘Il LuVd ‘'q WALI 40 


I Hdvuo 


September, 1955) MALLINSON - BUCK | 85 


3803 


UO 


se 
or 
sr 
as 
se 
- 


JOURNAL OF EXPERIMENTAL EDUCATION 


(43 MOd 
6661 ANVOANVE ‘NOLLVNINVXS ADOTIOIS ‘I LUVd “$1 NO SSHOOS 


Hdvuo 


86 ee (Vol. 24 


September, 1955) 


categories, it was called an item of ‘‘medium- 
low popularity. "’ 

5. If there were approximately equal num- 
bers of items in both the popular and average 
groups, it was considered to be of ‘‘m edium- 


high popularity. 


A listing of the items together with their re- 
spective classifications is found in Table XV. 

An analysis of Table XV reveals that the ma- 
jority (nineteen out of thirty-six) of the Biology 
items were considered to be of average popular- 
ity, while four were unpopular, and six popular. 

On the Chemistry Examinations, the largest 
number of items, fourteen of thirty-two, were 
considered to be popular, four unpopular, and 
seven of average popularity. 

It is interesting to note that there were no 
unpopular items on the Earth Science Examina- 
tions. Fifteen of thirty-two items were of av- 
erage popularity, and eleven were popular. 

Of the Physics items, thirteen of thirty-two 
were popular, ten were of average popularity, 
and three were unpopular. 

The table reveals also that more of the low- 
numbered items are in the popular or medium- 
high categories; while the high-numbered items 
(those appearing at the end of the examinations) 
are more often unpopular. This is, of course, 
partly explained by the fact that many students 
(particularly those obtaining the higher total 
scores) answer the items in order—1,2,3,4,5 
——thus omitting those with higher numbers. 

A further survey of the content of the popu- 
lar and unpopular items indicates that, ingen- 
eral, the popular items are those concerned 
with the knowledge of factual information. The 
following are examples: 


I. Biology 
‘*In dogs, wire hair is dominant over smooth 


hair. A wire-haired dog is crossed witha 

smooth~haired dog. Show by keyed diagrams 
the cross which would result in: (1) a litter 
in which no smooth-haired pups could ap- 
pear. (2) A litter in which a smooth-haired 
pup could be found.’’ (June 1950, Part I, 2 
a) 


. Chemistry 

‘‘Answer two of the following: (The atomic 
weights from the reference tables may be 
used to the nearest whole numbers, e.g.,Cl 
= 35.457 becomes 35.) (a) How many grams 
of sodium hydroxide will be needed to neu- 
tralize 189 grams of nitric acid? (b) How 
many cubic feet of oxygen will be required 
for the complete combustion of 17 cubic feet 
of carbon monoxide? (c) How many liters of 
hydrogen sulfide gas will react with 99. 3 
grams of Pb(NO,),?’’ (June 1950, Part II, 2) 


MALLINSON - BUCK 


Il. Earth Science 
‘‘The following questions refer to the accom- 
panying map: (a) Distinguish between contour 
line and contour interval. (b) State the con- 
tour interval of this map. (c) What is the high- 
est possible elevation of hill A? How much 
higher or lower is hill B than hill A? 
What does the map symbol at C represent?’’ 
(June 1949, Part Ll, 8) 


IV. Physics 

“‘A pulley system is used by a workman to 
raise a weight of 240 lb. a vertical distance 
of 24 feet. The workman's effort of 120 lb. 
moves through a distance of 72 feet. Find (1) 
the ideal mechanical advantage, (2) the actual 
mechanical advantage, (3) the efficiency of 
the pulley system. ’’ (January 1950, Part Ll, 
la) 


The unpopular items were found to be those 
involving the application of information, the use 
of elements of scientific method, the use of sci- 
entific attitudes, and the use of sciene in indus-~ 
try. The following are examples: 


I. Biology 
‘‘A boy without a microscope wants to find out 


if there are bacteria on his fingers. (1) What 
is a culture medium and how is it sterilized? 
(2) List two important steps in his experiment 
following this sterilization. (3) What would 
indicate the probable presence of bacteria? 
(4) What evidence would he require to justify 
a conclusion that the bacteria had come only 
from his fingers?'’ (January 1950, Part II, 
8 a) 


Chemistry 

‘*(a) Describe a process for making ethyl al- 

cohol from molasses. (b) Name a by-product 
of this reaction. Give a use for the by~-pro- 
duct. (c) Describe the manufacture of soap, 
mentioning the raw materials, the use of salt, 
and the by-product.’’ (January 1949, Part LI, 
Ta, b, c) 


Il. Physics 
‘‘(a) An electric motor drives a d-c generator 
which is used to charge a lead storage battery. 
(1) State step by step three useful energy 
changes that occur, beginning with the input 
to the motor and ending with the energy inthe 
battery. (b) Describe a laboratory experi- 
ment that may be used to illustrate two fac- 
tors that affect the magnitude of an induced 
emf.’’ (June 1950, Part Il, 7 a,, b) 


In addition to the above amalysis, still an- 
other survey was made regarding the relation- 
ship of popular and unpopular items with their 


68 JOURNAL OF EXPERIMENTAL EDUCATION 


degrees of difficulty. It is interesting to note 
that all or part of sixty-seven percent of the 
unpopular or ‘‘medium-low’’ items were also 

considered as being difficult. Hence, as one 

would expect, it appears that students tend to 

avoid the more difficult items. Of the popular 
or ‘‘medium~high’’ items, sixty-four percent 
appeared among the easy. 


ummar 


A general review of the data concerning the 
popularity of items indicates the following: 


1. The largest number of items were class- 
ified as being of average popularity, the second 
highest number, popular. 

2. In general, the popular items appeared 
early in the examinations, while the unpopular 
items usually appeared near the end. This would 
seem to indicate that many students answered 
the items in order, thus omitting the last items. 

3. Asa rule, the unpopular items were the 
so-called ‘‘thought’’ items, or those requiring 
the application of facts and principies, or the 
use of elements of scientific method and scien- 
tific attitudes. The popular items were more 
often of the memory or factual type. 


SECTION VIII 


SUMMARY AND CONCLUSIONS 


IN AN investigation as extensive as this 
one, any effort to prepare a detailed summary 
would be redundant. At the end of each section, 
the directors have listed a number of conc lu- 
sions and implications, that apply to the respec- 
tive section. Thus the summaries and conclu- 
sions that follow constitute little more than an 
epilogue, or perhaps a few philosophical obser- 
vations. 


1. This investigation, at least from the view- 
points expressed by the science teachers of New 
York State, fails to show that the Regents Ex- 
aminations in Science are distasteful as many 
educators have implied. Most of the science 
teachers, with various types and degrees of 
reservations, seem to believe that the science 
program in New York State profits by their 
presence. A number of teachers do suggest 
specific changes, particularly that the examin- 
ations should place greater emphasis on meas- 
urement of general education objectives. Yet, 
such complaints apply to all educational pro - 
grams and at all levels. 

2. The phases of the investigation that dealt 
with the objective characteristics of the exam- 
inations indicate that they are far more reliable 
and valid than teacher-made tests and com- 


( Wol. 24 


pare favorably with the commonly used standard- 
ized examinations inscience. While the discrim- 
inatory power of the items on the various exam- 
inations did not prove to be high, those on stand- 
ardized examinations fail to be much better. 

3. In general, the examinations are not prej- 
udicial to the interests of any particular group 
within New York State. While boysfrom the 
large high schools seemed to have the greatest 
achievement, and girls from small high schools, 
the least, the superiority and inferiority were 
neither consistent nor especially marked. Ap- 
parently the examinations appear to be as good 
(or bad) for one group as for another. 

4. It does seem that a better system for scor- 
ing the examinations is indicated. Apparently 
teachers have been ‘‘on their own’’ more than 
may be considered desirable, and as a result a 
number of irregular scoring practices have oc~- 
curred, Yet, none of these practices have been 
sufficiently widespread to cast doubt on the integ- 
rity of the science teachers of New York State as 
a whole. 

It would seem, as a final statement, that this 
study failed to elicit the slightest bit of evidence 
that the examination system in New York State 
should be abolished. While it has revealed that 
the system of Regents Examinations in Science 
has weaknesses, the weaknesses are relatively 
the same as those that could be found with any 
mode of evaluation. 


FOOTNOTES 


1. Miller, David John and Mallinson, George 
Greisen. ‘‘An Investigation of the Attitudes of 
Teachers Toward the New York State Reg - 
ents’ Examinations in Science, ’’ Science Ed- 
ucation, XXXVI (October 1952), 203-215. 


. Guilford, J. P. Fundamental Statistics in 
Psychology and Education (New York: Mc - 


Graw-Hill Book Co., 1950), 161-164. 


. Peters, C. C, and VanVoorhis, W.R. Sta- 
tistical Procedures and Their Mathematical 
Bases (New York: McGraw-Hill Book Co., 
1940), 398. 


Credit is due Professor Conway Sams, asso- 
ciate professor of mathematics at Western 
Michigan College of Education for helping to 
develop the statistical design, and for com- 
puting the corrections for the analysis of 
variance with unequal numbers of replica- 
tions. 


5. Lindquist, E. F. and Analysis of Ex- 


Design and Analysis of Ex 
periments in Psychology and Education (Bos- 


ton: Houghton-Mifflin Co. , 1953), 108-120. 


September, 1955) 


6. Snedecor, George W. Statistical Methods 


(Ames, Iowa: Collegiate Press, 1946), 289- 
293. 


. The directors wish to express their grati- 
tude to the many science teachers of New 

York State who contributed their time and 
efforts to evaluating the word lists. 


. Flesch, Rudolph. The Art of Plain Talk 
(New York: Harper and Brothers, 1946), 


vii + 210. 


9. Buckingham, B. R. and Dolch, E. W. A 
Combined Word List (Boston: Ginn and Co., 


10, Examinations. 


11. 


MALLINSON - BUCK 


1938), iii + 185. 


University of the State of 
New York, Division of Examimtions and 
Testing, Albany, New York, December 
1951. 


It should be noted that since each student 
must select five of the eight or nine items, 
the chance factor would result in the selec- 
tion of any item by approximately fifty-five 
percent. However, the cutting scores for 
this analysis of popularity have been setar- 
bitrarily at thirty and seventy percent and 
hence make allowance for the chance factor. 


4d 


A COMPARISON OF WECHSLER CHILDREN’S 
SCALE AND STANFORD-BINET SCORES 
FOR EIGHT- AND NINE-YEAR OLDS 


FRANK C. ARNOLD 
Bowling Green State University 
WINIFRED K. WAGNER 
Fremont, Ohio 


THE HIGHLY verbal nature of the Stan- 
ford-Binet Intelligence Scale has long b-en con- 
sidered a drawback in the psychological testing 
of certain children. The examiner in public 
schools must frequently use performance tests 
in cases where the Stanford-Binet seems inade- 
quate because of its predominantly verbal char- 
acter, e.g., the testing of children who work un- 
der a language handicap, those handicapped by 
speech and hearing defects, or those in whom 
the development of verbal and non-verbal abili- 
ties has been unequal (1). 

The Wechsler Intelligence Scale for Children 
(7) or WISC, on the other hand, gives a ve rbal 
and performance score as well as a total score. 
The question arises, however, as tothe degree 
of relationship between scores derived from 
this scale and those obtained with the Stanf or d- 
Binet, a scale already widely accepted. Itis the 
purpose of this study to ascertain the relation- 
ship between these two scales for a sample of 
eight- and nine-year olds. 

several studies reported in the literature 
show a relatively high relationship between the 
two scales. Cohen and Collier (2), using first 
and second graders, report Pearson corr ela- 
tion coefficients between IQ’s on the Stanford- 
Binet and WISC of . 82 for the verbal scale, .80 
for the performance scale, and .85-for the full 
scale. Frandsen and Higginson (3) found corre- 
lations of .71, .63, and .80 respectively be- 
tween verbal, performance, and full-scale 
WISC IQ and Stanford-Binet IQ with unse | ec t- 
ed fourth graders. In a summary of four stud- 
ies, Pastovic and Guthrie (5) report r’s rang- 
ing from . 63 to .83 between the Binet and the 
WISC Verbal Scale, from .57 to.75 for the 
performance scale, and .71 to . 88 for the full 
scale. Krugman, Justman, Wrightstone, and 
Krugman (4) obtained correlations of the two 
scales for various age levels from 5 years, 6 
months to 15 years, 6 months. They report 
r’s ranging from .65 to . 90 for the verbalscale, 
. 60 to . 75 for the performance scale, and .75 


to .90 for the full scale. Findings similar to 
these are reported by Weider, Moller 
and Schramm (8). 


Procedure 


In the present study, a random sample of fifty 
children was selected from the eight- and nine- 
year olds in the third and fourth grades of the 
Bowling Green, Ohio, elementary school system. 
The Stanford Binet Intelligence Scale (Form L) 
and the Wechsler Intelligence Scale for chil- 
dren were then administered to each of the fifty 
subjects. All tests were administered by one 
person and were scheduled so that not more than 
one week elapsed between administration of the 
two scales. The Stanford-Binet administration 
preceded the Wechsler for one-half the subjects 
while the Wechsler administration preceded the 
Stanford-Binet for the remaining subjects. The 
order of administration was set on the basis of 
odd-even number of the subject. 


Results 


Means and standard deviations of the IQ’s ob- 
tained by children in this sample on the two 
scales are presented in Table 1. Comparison of 
these data with those reported by Terman and 
Merrill (6) and Wechsler (7) indicates that ob- 
tained results are quite similar, 

In Table II* are presented the correlations be- 
tween the Stanford-Binet and WISC scores with 
which we are concerned here, For purposes of 
comparison with reliability data of the Stanford- 
Binet, correlations between IQ's have been used 
as well as between mentalages and scaled scores. 
Correction of these r’s has been made to take in- 
to account the differences between standard devi- 
ations obtained with this group and those report- 
ed by Wechsler (7). 

In assessing the interchangeability of 
two scales for use in working with children, 

a logical approach would seem to be that of de- 


«The correlation of .69 between the performance scale of tne WISC and the Stanford-Binet is a correc- 
tion of resulte reported in Winifred K. Wagner, 


A of Stanford-Binet Mental and 
Scaled Scores of the Wechsler Intelligence Scale for or y en unpub 
sis, versity, 


92 JOURNAL OF EXPERIMENTAL EDUCATION 


TABLE I 


COMPARISON OF 1Q’s OBTAINED BY FIFTY CHILDREN ON STANFORD-BINET AND 
WECHSLER INTELLIGENCE SCALE FOR CHILDREN 


Stanford- WISC 
Measure Binet Verbal Performance Full 
Mean 104. 52 101. 88 104. 70 103. 34 
Standard Deviation 15. 66 12. 75 15. 40 13.59 
TABLE I 


CORRELATIONS BETWEEN WECHSLER INTELLIGENCE SCALE 
FOR CHILDREN AND STANFORD-BINET 


WISC 

Item Correlated Verbal Performance Full 
Mental Age with 

Scaled Score .77 . 69 .81 

1Q . 85 . 75 . 88 


IQ (Corrected) . 88 .74 . 90 


(Vol. 24 


September, 1955) 


termining whether the relationship between the 
two scales differs significantly from the re lia- 
bility coefficient of one of the scales. Our con- 
cern here would be a comparison of the correla- 
tion coefficients obtained between the WISC and 
the Stanford-Binet and between Forms Land M 
of the Stanford-Binet. If the relationship found 
between the WISC and Binet is not significantly 
different from that between two forms of the Bin- 
et, then the use of the WISC as a substitute for 
the Binet in assessing IQ would seem reasonable. 
If, on the other hand, results do differ sig nifi- 
cantly, then other factors must certainly be con- 
sidered in the substitution of one scale for the 
other. 

The r of .93 given as the median value of re- 
lationship between Forms L and M of the Binet 
for ages six to sixteen was used as a basis for 
this comparison. The corrected correlation co- 
efficients reported in Table II were transformed 
to Fisher z scores and differences computed be- 
tween the z scores equivalent to these cor re la~ 
tions and the z equivalent to a correlation »! . 93. 
From these differences, critical ratios were com- 
puted using the standard error of the differ e nce 
between z’s. The critical ratio found between 
the Binet reliability r of . 93 and the cor rected 
r of .88 obtained between the WISC Verbal Scale 
and the Stanford-Binet was 1.74, significant 
at the 10% level of confidence; for the corrected 
r of .74 between the WISC Performance Scale 
and the Binet was 4.37, significant at the . 1% lev- 
el of confidence; and for the corrected r of .90 
between the WISC Full Scale IQ andthe Binet 
was 1.15, significant below the 10% level of con- 
fidence. 


Discussion 


Data presented in Table I would seem to in- 
dicate that results obtained from this sample of 
children on the Stanford-Binet are quite similar 
to those reported for the standardization group 
(6). 

Data concerned with relationship between the 
two scales is similar to that found by other inves- 
tigators. Whether mental ages and scaledscores 
or IQ’s are used, correlation coefficients are 
large enough to indicate a marked relationship 
between the two scales. This is particularly 
true for the verbal scale and full scale scores 
on the WISC. A squaring of the corrected r’s 
given in Table II shows the common variance 
between the WISC and the Binet to be 77% for the 
verbal scale, and 81% for the full scale. 

So far as this sample is concerned, the re- 
lationship between IQ’s obtained for eight~ and 
nine-year olds with the WISC and the Form L 
Binet is not significantly different from the re- 
lationship between IQ’s obtained on Forms Land 
M of the Binet. So far as totalscore is con- 


ARNOLD 


cerned then, the WISC might very well be sub- 
stituted for the Binet or the Binet for the WISC. 
From results of this study, the same would 
seem to be true for the WISC VerbalScale. This 
would not seem to be true, however, for the WI- 
SC Performance Scale since the relationship 
found differs significantly from that between 
Forms L and M at the . 1% level of confidence. 

Clinically, it would seem that these findings 
have practical implications for the use of the 
various scales concerned, Total scores on the 
WISC or scores on the WISC Verbal Scale and 
the Binet would seem close enough to each other 
to offer practical interchangeability of the two 
scales. At the same time, the WISC Perform-~- 
ance Scale would appear to be getting at a differ- 
ent facet of intelligence than is the totalor verbal 
score of the WISC or the total score of the Bin- 
et. This study has not concerned itself with 
what these different scores may mean so far as 
prediction of various kinds of behavior is con- 
cerned, However, if broadened prediction is 
possible with the performance scale of the WISC 
while at the same time the total score and verb- 
al score closely approximate that of a well-ac- 
cepted tool, the WISC may prove to bea quite 
useful clinical instrument. Further research 
is necessary, of course, both to check findings 
of the present study and to determine the mean- 
ing of sub-scale scores of the WISC. 


REFERENCES 


1. Arthur, Grace. A Point Scale of Perform- 
ance Tests, Clinical Manual (New York: 
The Commonwealth Fund, 1943). 

2. Cohen, B. D. and Collier, Mary J.‘‘A Note 
on the WISC and Other Tests of Children 
Six to Eight Years Old, ’’ Journal of Con- 
sulti sycho , XVI (1952), pp. 226- 

27. 

3. Frandsen, A. N. and Higginson, J. B. ‘‘The 
Stanford-Binet and the Wechsler Intelligence 
Scale for Children, '’ Journal of Consulting 

Psychology, XV (1951), pp. 236-238. 

4. Krugman, Judith I, and others. ‘‘Pupil Func- 
tioning on the Stanford-Binet and the Wech- 
sler Intelligence Scale for Children, '' Jour- 


nal of Consulting Psychology, XV (1951), 
pp. 475-483. 


5. Pastovic, J. J. and Guthrie, G. M. ‘‘Some 
Evidence on the Validity of the WISC,’’ Jour- 
nal of Consulting Psychology, XV (1951), 
pp. 385-386. 

6. Terman, L. M. and Merrill, Maude A, Meas- 


uring Intelligence (New York: Houg hton- 
Mifflin, 1937). 


7. Wechsler, D. Wechsler Intelligence Scale 
for Children, Manual (New York: The Psy- 
chological Corporation, 1949). 


94 JOURNAL OF EXPERIMENTAL EDUCATION (Vol. 24 


8. Weider, A. and others. ‘‘The Wechsler In- vised Stanford-Binet, ’’ Journal of Con- 
telligence Scale for Children and the Re - sulting Psychology, ¥V (1951), pp. 330-333. 


~~ 
‘ 


We regret that the following three tables were inadvertently left out of author Evan R. Keislar's 
article ‘‘Peer Group Rating of High School Pupils with High and Low School Marks, ’’ published in 


ERRATA 


the June 1955 Journal of Experimental Education. 


TABLE I 


CORRELATIONS OF EACH OF TWELVE TRAIT RATINGS WITH OTIS I.Q. AND 
SCHOOL MARKS FOR 126 BOYS AND 128 GIRLS 


Otis 1.Q. School Marks 
Trait Boys Girls Boys Girls 
1. Talkative silent 02 06 .10 ~,22 
2. Old acting - young acting .10 .17 .21 . 06 
3. Friendly - unfriendly . 03 .14 . 09 .24* 
4. Likes schoolwork - dislikes schoolwork . 38* . 159 . 
5. Considerate - inconsiderate .17 . 09 . 36° . 26* 
6. Popular - unpopular (with opposite sex) -.12 .13 -.07 -.10 
7. Persistent - not persistent . .23* . 63* . 49" 
8. Welcomed - ignored (by same sex) 09 . 18 . 03 
9. Puts studies first - puts studies last . 30" . 36* . 70* . 10* 
10. Conceited - not conceited -.16 04 ~.22 
11. Cheerful - sad 02 . 08 06 . 09 
12. Boys athletically competent - incompetent ~,22 -.07 
12. Girls influential - not influential .20 .31* 


*Significantly different from zero at the . 01 level. 


TABLE 


DIFFERENCES ON TRAIT RATINGS BETWEEN TWO GROUPS OF HIGH SCHOOL 
GIRLS MATCHED FOR OTIS L.Q. BUT DIFFERING IN SCHOOL MARKS 


Based on 27 girls in each group 


School Levelof 
Trait Marks Mean Q D Sp t Signif. 

Talkative - Low 57.3 9.0 7.8 3.3 £42.39 . 05 

Silent High 49.5 12.9 
Old acting - Low 51.9 6.4 1.5 1.8 . 87 ° 

Young acting High 50.4 6.3 
Friendly - Low 53.0 7.0 

Unfriendly High 57.5 7.8 4.5 1.9 2.39 . 05 
Likes schoolwork - Low 39.9 5.6 

Dislikes schoolwork High 58.0 8.3 18.1 1.8 9. 87 . 001 
Considerate - Low 49.4 6.7 

Inconsiderate High 53.0 5.2 3.6 1.6 2.24 . 05 
Popular - Low 55.0 9.9 6.2 2.2 2. 84 . 01 

Unpopular (with op- High 48.8 9.9 

posite sex) 

Persistent - Low 46.8 5.0 

Not persistent High 54.9 5.7 8.2 1.2 6. 66 . 001 
Welcomed - Low 52.3 4.1 

Ignored (by same High 54.0 a3..4.7 2.4 

sex) 

Puts studies first - Low 44.4 4.4 

Puts studies last High 54.4 4.6 10.0 -94 10.62 . 001 
Conceited - Low 51.0 6.3 3.7 4.7... de . 05 

Not conceited High 47.3 5.3 
Cheerful - Low 53.2 5.8 

Sad High 54.2 6.0 1.0 1.6 . 61 - 
Influential - Low 48.8 4.5 

Not influential High 53.4 5.9 4.6 1.2 3.79 . 001 


Note: All figures reported have been rounded off to one decimal place except for 


the values of t. 


TABLE ll 


DIFFERENCES ON TRAIT RATINGS BETWEEN TWO GROUPS OF HIGH SCHOOL 
BOYS MATCHED FOR OTIS LQ. BUT DIFFERING IN SCHOOL MARKS 


Based on 35 boys in each group 


School Levelof 
Trait Marks Mean D t Signif. 
Talkative - Low 64.0 11.9 3.1 2.9 1.07 
Silent High 50.8 11.6 
Old acting - Low 46.2 8.0 
Young acting High 50.8 8.1 4.6 2.1 2.25 05 
Friendly - Low 50.2 6.6 
Unfriendly High 52.4 7.2 2.2 1.7 i.33 
Likes schoolwork - Low 43.5 10.3 
Dislikes schoolwork High 57.3 9.2 13.8 2.2 6.31 00i* 
Considerate - Low 47.8 5.6 
Inconsiderate High 51.9 68.5 4.1 13 63.11 
Popular - Low 48.8 8.1 
Unpopular (with op- High 49.1 8.1 3 32.3 .13 
posite sex) 
Persistent - Low 46.6 4.7 
Not persistent High 52.0 4.9 5.4 -9 5.91 . 001 
Welcomed - Low 49.4 6.7 
Ignored (by same High 52.8 6.8 3.4 1.7 1. 98 eee 
sex) 
Puts studies first - Low 45.0 5.7 
Puts studies last High 54.3 5.8 9.3 1.4 6.47 . 001 
Conceited - Low 51.5 5.4 3.0 1.3 2.36 . 06%" 
Not conceited High 48.5 5.1 
Cheerful - Low 51.8 4.5 
Sad High 52.4 6.4 1.6 1.0 . 63 one 
Athletically competent Low 47.7 6.7 
Athletically incompe- High 51.2 9.9 3.5 2.0 1.71 ove 
tent 
Note: All figures reported have been rounded off to one decimal place except for 
the values of t. 


* For the distribution of trait scores the hypothesis of normality could be rejected 
at the . 02 level but not at the . 01 level. 

**For the distribution of scores the hypothesis of normality could be rejected at 
the . 05 level but not at the . 02 level. 


HANDBOOK OF PRIVATE SCHOOLS | 


36th ed., 1264 pp., illus., red silk cloth, $8.00 


The entire field of Private Education is described in this 
1955 edition,—boarding and day schools, and secondary, 
preparatory and tutorial schools. These and others schools are 

es. 

Used by executives and guidance officers ev as the 
source of information for the independent school field. 

Listings of school memberships, associations, foundations, 
clinics, and vocational guidance agencies, a Who's Who of 
School Administrators, and New Finding Lists. New and up-to- 
date for preparatory schools includes 1954 college entrance 


PORTER SARGENT PUBLISHER 


11 BEACON STREET BOSTON 8, MASSACHUSETTS 


Specifications for Manuscripts 


JOURNAL OF EDUCATIONAL RESEARCH 
and the... 


JOURNAL OF EXPERIMENTAL EDUCATION 


6. Bibliographical notes must be complete and arranged alphabetically. 

earnestly required. It is difficult to produce technical journals accurately, 
neatly, and on time under the best conditions. Promptness in printing, 


economy, and accuracy will be promoted by carefully prepared manuscripts. 


for the... 

1. All manuscripts must be typewritten, double spaced, and on one side 
of the sheet only. Mimeographed and ditto sheets are acceptable only 
when very clearly printed. 

2. All unusual symbols or formulae must be very clearly typed or hand 
printed in black ink. To avoid costly printers’ composition charges it 
may be necessary for us to make cuts of difficult matter, or to print 
your material by the photo-offset lithography method. The latter means 
photographing your actual copy. It is expensive to have material re- 
drawn by our own artists, and retracing or duplicating increases the 
hazards of error. See that your copy is correct and complete as you 
wish to have it reproduced. The men who work on your manuscripts 
are not trained to understand the working symbols and language of 
your technical field. 

8. The same restrictions and requirements as in Paragraph 2 apply to all 
drawings, graphs or other illustrated materials,—they must be neatly 
done, in black ink, on bond paper or tracing cloth suitable for repro- 
duction. Remember our magazines are printed in black ink only. Color 
graphs should be changed by the author to provide different kinds of 
shading for the different areas. For example: diagonal lines for red, 
vertical lines for blue, etc. Provide a key. 

4. All tables, graphs, etc., on sheets by themselves must be properly labeled 
and identified in relation to the written copy of the manuscript. 

6. Footnotes must be complete as to author, title, place of publication, 
publisher, date and pages. They must be numbered consecutively 
throughout the article. 


